Home

LocalitySensitive

Locality-sensitive refers to a property of a function, hashing scheme, or data-structure in which close or similar inputs are mapped to close or similar outputs with high probability. The concept is central to locality-sensitive hashing (LSH), a framework introduced to enable efficient approximate nearest neighbor search in high-dimensional spaces. Locality-sensitive methods aim to reduce the search space by grouping similar items into the same or nearby buckets.

A locality-sensitive family of hash functions is defined with respect to a distance measure. For a given

Common instantiations include p-stable LSH for Lp distances (for example, L2 using Gaussian random projections and

distance
and
a
pair
of
distance
thresholds,
a
family
is
designed
so
that
inputs
with
small
distance
collide
(map
to
the
same
output)
with
higher
probability
than
inputs
that
are
far
apart.
In
practice,
multiple
hash
functions
and
several
independent
hash
tables
are
used
to
boost
the
chance
of
retrieving
a
close
neighbor
while
keeping
the
overall
computational
cost
manageable.
The
method
trades
exactness
for
speed,
offering
probabilistic
guarantees
rather
than
exact
results.
L1
using
Cauchy
projections),
minwise
hashing
for
Jaccard
similarity,
and
sign
random
projections
for
cosine
similarity.
Applications
span
fast
approximate
nearest
neighbor
search
in
high-dimensional
data,
multimedia
and
document
retrieval,
near-duplicate
detection,
data
deduplication,
and
scalable
clustering.
Limitations
include
the
need
for
careful
parameter
selection,
potential
memory
overhead,
and
the
lack
of
guaranteed
exact
matches,
particularly
as
dimensionality
grows.