Understanding Modeling
overview
MOdeling is the process of building a simplified representation of a dataset with the intent
to either form an understanding of the data (that is, formulate a conherent story explaining the data)
or to forecast other data points.
Except in some possible edge cases, Modeling represents a simplification of the available data. That is, it
takes a large dataset and constructs a small description of the data by explaining the relationships between
the variables. In this sense, it represents a form of compression, or information extraction.
(see
Information theory)
Regression Example
An
ordinary least squares regression
is a fairly simple example of a model.
The regression line represents a simplification of the the data. It extracts a relationship between the x and y values of each data point.
This relationship is not exact, that is, there is some noise that is suppressed by the regression line.
Uses
Once the regression line has been drawn it can be used to
- Explanation
- regression line may be used to explain how the variables are related and to provide a story around that relationship
- Forecasting
- the regression line can be used to forecast the y value of datapoint for which only the x value is known.