Understanding Modeling

Overview


Modeling is the process of building a simplified representation of a dataset with the intent to either form an understanding of the data (that is, formulate a conherent story explaining the data) or to forecast other data points.

Except in some possible edge cases, Modeling represents a simplification of the available data. That is, it takes a large dataset and constructs a small description of the data by explaining the relationships between the variables. In this sense, it represents a form of compression, or information extraction. (see Information theory)

Regression Example


An ordinary least squares regression is a fairly simple example of a model.


The regression line represents a simplification of the the data. It extracts a relationship between the {% x %} and {% y %} values of each data point. This relationship is not exact, that is, there is some noise that is suppressed by the regression line.

Uses


Once the regression line has been drawn it can be used in the following ways:

  • Explanation - regression line may be used to explain how the variables are related and to provide a story around that relationship
  • Forecasting - the regression line can be used to forecast the y value of datapoint for which only the x value is known.

These two uses of modeling is similar to the ex-post/ex-ante distinction. Ex-post analysis analyzes historical data, that is, it is backward looking. Ex-ante refers to models and forecasts that are used to predict future results.

Modeling Types


  • Local Averaging - a model that uses the average of data points similar to the point in question in order to formulate a forecast.
  • Structural - a form of model that assumes a certain underlying structure to the data at hand.
  • Reduced Form - simply tries to model probabilities of outcomes.

Topics


  • Broken Models - inaccurate or incorrectly specified models