Home

knearestneighbor

K-nearest neighbors (KNN) is a simple, instance-based learning algorithm used for classification and regression. For a new observation, it identifies the k training samples closest to the observation in feature space and predicts the output from those neighbors. As a lazy learner, KNN does not build a general model during training; instead, it stores the training data and makes predictions by comparing new instances to them.

Prediction is based on distance measures such as Euclidean, Manhattan, or more generally Minkowski distance. The

KNN has a low training cost but high inference cost, since every prediction requires a search through

Common variants include weighted voting, choice of k through cross-validation, and approximate nearest neighbor techniques to

choice
of
k
influences
bias
and
variance:
small
k
can
be
noisy,
large
k
yields
smoother
outputs.
For
classification,
predictions
are
typically
made
by
majority
vote
among
the
k
neighbors;
for
regression,
the
mean
(or
weighted
mean)
of
the
neighbors’
targets
is
used.
Weighted
KNN
can
weight
neighbors
by
inverse
distance
to
emphasize
closer
points.
the
training
set.
It
scales
poorly
to
large
datasets
and
high-dimensional
spaces.
Practical
implementations
use
data
structures
such
as
KD-trees
or
ball
trees
and
perform
feature
scaling
to
ensure
fair
distance
computations.
The
method
is
sensitive
to
irrelevant
features
and
the
scale
of
measurements.
speed
up
queries.
KNN
is
widely
used
in
recommender
systems,
pattern
recognition,
and
intrusion
detection,
particularly
when
training
data
are
plentiful
and
interpretable
decisions
from
local
neighbors
are
desirable.