Stationarity
Overview
Stationarity is a concept that plays a big role in time series analysis. The observations of a time series
are not all independently generated. This causes a problem when applying statistical methods based on the concept
of random sampling. Many of the tools of
statistical inference
are dependent on the samples drawn having been i.i.d. That is, independent and drawn from identical distributions.
The concept of stationarity corrects many of these problems, by guaranteeing that the normal statistics are
valid as the number of sample points gets large.
Stationarity
A time series is stationaryif the distribution function of a set of points in the series
is independent of when the series starts.
{% F(x_i, x_{i+1}, ...., x_{i+n}) = F(x_j, x_{j+1}, ...., x_{j+n}) %}
X is weakly stationary if {% \mathbb{E}(x_t) %} is independent of t. {% Cov(x_{t+h}, x_t) %} is independent of t.
a process that is stationary is by definition weakly stationary.
Stationarity and Statistical Inference
The importance of stationarity in a time series is its relationship to
statistical inference.
The workhorse of statistical inference is the
independent and identically distibuted sampling assumption. That is, when presented with a dataset,
(say {% {x_1,x_2, ... , x_n} %}), the analyst assumes that the data was drawn from a population
in such a manner that each draw is independent of the other draws, and that each draw is taken from the
same distribution. (Often, this means random sampling from a population with replacement).
An estimate of the average or expected value of a draw can then be calculated as
{% Average = \frac{1}{n} \sum x_i %}
Then we know that the variance of the average is
{% Variance = \frac{1}{n} \sum Var(x_i) %}
that is, the measurement gets more accurate (in probability) as the number of samples increases.
When a time series is not stationary, it poses several problems for statistical inference. First,
if the mean is not independent of t, then the estimate above cannot be used. Second, points
in a time series are rarely independent.
When using
ordinary least squares resgression
in a time series,
the strict exogeneity assumption is often violated.
(see
regression in a time series)
Intuition
The typical solution to the statistical inference problem is to transform the time series in question into
one that stationary. The stationary time series has
moments
that are independent of time so that an estimate can be formed. In addition, a stationary process is one where
the relevance of the value of one data point to another decreases with the time distance between them.
Consider a time series such as the following:
{% x_{t+1} = \delta x_t + \epsilon_t %}
if {% \delta < 1 %}, then the impact of {% x_t %} on {% x_{t+T} %} decreases with as T grows large. That is
{% \mathbb{E}(x_{t+T}|x_t) = \delta^T \mathbb{E}(x_t) \rightarrow 0 %}
That is, if a time series is stationary and large, the points begin to look more independent. In particular,
the tools of
ordinary least squares regression
can be used due to the
large sample properties.
Stochastic Trends
The presence of non-stationarity (stochastic-trends) can invalidate regression hypothesis testing statistics.
The following shows 2 randomly generated series, with each either stationary or not, and the results of
a regression on the values.
When both x and y are non-stationarity, the p-values alomst always show a significant relationship,
even though each series is randomly generated. When only one is stationarity, no relationship is Typically
detected. (However, it should be noted that the proof of the p-value significance uses stationarity of all variables,
so even though sample regression seems to indicate that one can still use the statistics, it is not justified with a
full mathematical proof and should be used with care.)
Test for Stationarity
Topics