Home

kNearestNeighbors

K-Nearest Neighbors (kNN) is a simple, instance-based learning method used for classification and regression. It makes predictions for new data points by examining the labels of the k most similar examples in the training data, where similarity is defined by a distance metric in feature space. kNN is non-parametric and lazy: there is no explicit training phase that builds a model; instead, all training instances are stored and the prediction is computed at query time.

To predict a label for a new instance, the distances to all training instances are computed, the

Common choices of distance include Euclidean, Manhattan, and other Minkowski metrics; feature scaling is important because

Computationally, naïve kNN requires O(n d) distance calculations per prediction, where n is the number of training

Applications include image and text classification, recommender systems, anomaly detection, and imputation. Limitations include high memory

k
closest
ones
are
selected,
and
a
prediction
is
made
by
a
majority
vote
for
classification
or
by
averaging
the
responses
for
regression.
In
weighted
variants,
nearer
neighbors
contribute
more
to
the
prediction,
often
using
inverse
distance
weights.
kNN
relies
on
distance
in
the
original
feature
space.
Categorical
features
may
require
encoding,
and
high-dimensional
data
can
degrade
performance
(the
curse
of
dimensionality).
The
choice
of
k
balances
bias
and
variance:
small
k
can
be
sensitive
to
noise,
large
k
can
smooth
over
local
structure.
examples
and
d
is
the
feature
dimension.
Spatial
indexing
structures
and
approximate
nearest-neighbor
methods
can
reduce
this
cost.
requirements,
sensitivity
to
irrelevant
features,
and
poorly
defined
decision
boundaries
for
some
problems.
See
also
nearest
centroid,
distance
metrics,
and
kernel
methods.