Home

tcloseness

T-closeness is a data anonymization privacy model designed to prevent attribute disclosure in published data. It builds on k-anonymity and l-diversity by requiring that the distribution of a sensitive attribute within every equivalence class of quasi-identifiers is "close" to the distribution of that attribute in the overall data. The closeness is measured with a distance function and bounded by a threshold t, which is chosen by the data publisher.

Formally, let S be a sensitive attribute with possible values and let P be the global distribution

Purpose and rationale: t-closeness aims to reduce the risk that an attacker, who knows an individual's quasi-identifiers,

Calculation and use: To achieve t-closeness, data custodians apply anonymization techniques that generalize or suppress quasi-identifiers

Limitations: t-closeness does not provide formal guarantees like differential privacy and can still allow re-identification risks

of
S
in
the
dataset.
For
each
equivalence
class
G
created
by
generalizing
or
suppressing
quasi-identifiers,
let
P_G
be
the
distribution
of
S
within
G.
The
dataset
satisfies
t-closeness
if
the
distance
between
P_G
and
P
does
not
exceed
t
for
every
class
G.
Common
distance
metrics
include
Earth
Mover’s
Distance
(Wasserstein
distance)
or
other
distributional
distances
such
as
L1
or
chi-squared
measures.
can
infer
sensitive
information
beyond
what
is
implied
by
the
overall
data
distribution.
It
addresses
certain
disclosure
risks
that
k-anonymity
and
l-diversity
do
not
fully
mitigate,
especially
when
the
sensitive
attribute
has
a
skewed
or
narrow
distribution.
until
all
equivalence
classes
meet
the
t-closeness
criterion.
If
a
class
cannot
be
made
to
satisfy
the
threshold
without
excessive
data
distortion,
the
publisher
may
adjust
t
or
the
data
release
strategy.
under
adversaries
with
auxiliary
information.
It
can
also
degrade
data
utility
if
the
threshold
is
too
strict
or
the
data
domain
is
highly
imbalanced.