Home

outlier

An outlier is an observation that diverges markedly from the other observations in a data set. The designation is relative to the distribution, model, and purpose of analysis; an apparent outlier in one context may be a valid observation in another. Outliers can indicate variability in the population, measurement error, data entry mistakes, or rare events.

In statistics, outliers can distort summary statistics and model estimates, especially the mean and linear regression

Detection methods include graphical approaches such as boxplots and scatterplots, as well as quantitative criteria. The

Handling outliers depends on the context. They may be corrected if they result from errors, transformed (for

coefficients.
They
are
common
in
many
domains,
including
finance,
biology,
sensor
data,
and
social
science.
Distinguishing
between
data
errors
and
legitimate
extreme
values
is
a
key
data-quality
task.
IQR
rule
defines
outliers
as
observations
below
Q1
minus
1.5
times
IQR
or
above
Q3
plus
1.5
times
IQR.
Z-scores
or
modified
z-scores
using
the
median
absolute
deviation
can
identify
extreme
values.
Statistical
tests
such
as
Grubbs'
test
or
Dixon's
Q
test
are
applied
in
small
samples,
while
robust
methods
and
machine
learning
algorithms
perform
anomaly
detection
in
larger
data
sets.
example
by
log
or
square
root
transformation),
or
excluded
from
analysis.
Alternatively,
robust
statistical
methods
and
models
that
are
less
sensitive
to
extreme
values,
such
as
median-based
summaries
or
robust
regression,
can
be
used.
Documentation
of
decisions
regarding
outliers
is
essential
for
reproducibility.