Home

scatterplot

A scatterplot is a two-dimensional data visualization used to examine the relationship between two quantitative variables. Each observation is represented by a single point with coordinates corresponding to the values of the two variables. The pattern formed by the points can indicate the strength, direction, and form of any association between the variables.

Construction and interpretation

Axes are labeled with the respective variable names and units, and scales are chosen to reflect the

Variants and enhancements

A scatterplot matrix (or pairs plot) extends the idea to multiple variables by displaying all pairwise scatterplots

Limitations and best practices

Scatterplots show association but not causation and can mislead if scales are distorted or if subsamples bias

data
range.
Points
may
be
transparent
or
slightly
jittered
to
reduce
overplotting
when
many
observations
share
similar
values.
A
regression
line
or
smoothing
curve
can
be
added
to
summarize
the
overall
trend,
but
such
additions
do
not
imply
causation.
The
scatterplot
is
commonly
used
in
exploratory
data
analysis
to
identify
linear
or
non-linear
relationships,
clusters,
outliers,
and
potential
data
quality
issues.
The
strength
of
association
can
be
assessed
quantitatively
with
correlation
metrics
or
locally
with
nonparametric
fits.
in
a
grid.
Three-dimensional
scatter
plots
visualize
a
third
variable
using
depth,
while
hexbin
plots
or
transparency-based
encodings
help
with
large
datasets.
Color,
shape,
or
size
can
encode
additional
categories
or
measurements
to
convey
more
information
within
the
same
plot.
the
view.
Careful
labeling,
consideration
of
outliers,
and
awareness
of
confounders
are
important
for
valid
interpretation.