coresets
A coreset is a small, weighted subset of a dataset that serves as a proxy for the entire set with respect to a chosen objective. Formally, for a dataset X and a family of queries Q with cost functions f_q, a coreset S with weights w satisfies f_q(X) ≈ f_q(S) for all q in Q within a prescribed error ε. In clustering, a common focus is the k-means or k-median objective: a coreset preserves the value of the objective for any choice of k centers within a factor of (1±ε).
Coreset constructions are designed to produce small summaries that still yield provable guarantees. Methods include sampling
Applications of coresets span large-scale data analysis and machine learning. They are used to accelerate approximate
Example: for k-means, a coreset is a weighted subset such that the sum of squared distances to