KNN

K-Nearest Neighbors (KNN) is a simple, instance-based learning method used for classification and regression. It does not build an explicit model during training; instead, it stores the training data and makes predictions by comparing new instances to stored examples. In prediction, the algorithm identifies the k training samples closest to the query instance according to a chosen distance metric and aggregates their labels or values.

Common distance metrics include Euclidean, Manhattan, and Minkowski distances, and, for categorical features, Hamming distance. The

For classification, the prediction is typically the most frequent class among the k nearest neighbors (majority

Advantages include simplicity, no training phase, and the ability to model complex decision boundaries. Limitations include

To improve efficiency on larger datasets, spatial indexing structures such as KD-trees or Ball Trees can be

a

standardization

a

a

prediction-time

dimensionality.

k

k

k

a

k

cross-validation.