Univariate Tail Detection

Overview


Univariate tail detection uses some numerical value on each datapoint, or some computed value from the datapoint, and then calculates the probability that the measured value of any random point is less than (or possibly greater than) the measured value.

Methods


  • Supervised Learning - Methods to calculate the probability that random measurements are less/greater than the measured value often use the Markov Inequality or the Chebychev Inequality.
  • Distribution Based Detection k-Nearest - If the distribution of the data generating process is known, one can simply calculate the cumulative distribution function of the measured value to get the probability.
  • Regression Based Detection - OLS Regression can be used to compute a numeric measure used to detect outliers. In particular, the OLS regression equation forecasts the dependent variable {% y %} as
    {% y = \alpha + \beta_1 x_1 + ... + \beta_n x_n + \epsilon %}
    where {% \epsilon %} is the error. Given a dataset, one of the measured values {% y %} one each datapoint can be fit using the regression equation. Outliers are detected when the error from the regression equation is large. If the error can assumed to be normal, the probability of the outlier can be computed.

Contents