KNN
K-Nearest Neighbors (KNN) is a simple, instance-based learning method used for classification and regression. It does not build an explicit model during training; instead, it stores the training data and makes predictions by comparing new instances to stored examples. In prediction, the algorithm identifies the k training samples closest to the query instance according to a chosen distance metric and aggregates their labels or values.
Common distance metrics include Euclidean, Manhattan, and Minkowski distances, and, for categorical features, Hamming distance. The
For classification, the prediction is typically the most frequent class among the k nearest neighbors (majority
Advantages include simplicity, no training phase, and the ability to model complex decision boundaries. Limitations include
To improve efficiency on larger datasets, spatial indexing structures such as KD-trees or Ball Trees can be