Home

dimRp

DimRp, short for dimensionality reduction by random projections, refers to techniques that map high-dimensional data into a lower-dimensional space by multiplying with a random projection matrix. The goal is to preserve approximately the distances between data points while enabling faster computation and lower memory usage, drawing on ideas from the Johnson-Lindenstrauss lemma.

The typical workflow starts with a dataset X in R^{n×d}. A random matrix R in R^{d×k} is

Variants include dense random projections and sparse variants (for example, schemes using entries from {−1, 0,

Key properties include probabilistic guarantees on distance preservation: with appropriate choice of k, pairwise distances are

Applications span large-scale machine learning pipelines, text mining, clustering, approximate nearest-neighbor search, and information retrieval, where

See also: Johnson-Lindenstrauss lemma; random projection; dimensionality reduction; PCA.

References: Johnson and Lindenstrauss (1984); Achlioptas (2003); Bingham and Mannila (2001).

generated,
with
k
<
d,
and
the
projected
data
X'
=
X
R
lies
in
R^{n×k}.
Elements
of
R
are
drawn
from
distributions
such
as
Gaussian
N(0,1)
or
sparse
sign
distributions
to
reduce
multiplications.
In
many
cases,
centering
is
unnecessary,
and
the
projection
can
be
applied
in
streaming
or
in
memory-limited
environments.
+1})
that
reduce
computation
and
storage.
More
recent
implementations
use
structured
or
fast
transforms
to
accelerate
multiplication.
preserved
within
(1±ε)
with
high
probability
for
all
pairs.
The
method
is
data-independent
and
fast,
but
may
not
capture
covariance
structure
as
PCA
does,
and
distortions
can
occur
for
small
sample
sizes
or
in
sensitive
tasks.
speed
and
scalability
are
important.