Outliers
Overview
Outliers are datapoints in a dataset that either have been drawn from the tail of the distribution that generated the dataset, or
appear to have deviated so much from the distribution so as to raise suspicion as to whether it was generated from the same data process.
Reasons for Outlier Detection
- Some quality control programs are intended to detect outliers in order to find the defective items.
- Some methods of inference are sensitive to outliers, that is outliers have an outlized impact on the parameters being fit.
In such cases, it may make sense to try to process the outliers in some fashion in order to reduce their overall imparct.
Topics
- Outlier detection algorithms usually assign a label or score to each point, indicating whether the algorithm believes the
point to be an outlier. In a simple labeling algorithm, each point is assigned a label as an outlier or not an outlier.
More sophisticated algorithms will assign a numeric score, which indicates a degree of likelihood that the point is
an outlier, according to the algorithm. Of course, any numeric score can be converted to a labeling algorithm by simply
chosing a threshold.
- Outlier Processing - discusses methods for processing outliers, if needed