Home

kanonymity

Kanonymity, commonly written as k-anonymity, is a property of a data release that aims to protect privacy by ensuring that each individual is indistinguishable from at least k−1 others with respect to a defined set of quasi-identifiers. Quasi-identifiers are attributes that may not identify a person on their own but could do so when combined with external information (for example, ZIP code, birth year, and gender).

To achieve k-anonymity, data publishers generalize or suppress quasi-identifier values until every combination of quasi-identifiers appears

Originating in the data privacy literature, the concept was introduced by Latanya Sweeney in 2002 as a

Limitations of k-anonymity include vulnerability to attribute disclosure if all records within an equivalence class share

in
at
least
k
records.
For
example,
a
table
containing
ZIP,
birth
year,
and
another
attribute
can
be
transformed
so
that
ZIP
is
generalized
to
a
broader
area
and
birth
year
to
a
decade,
producing
equivalence
classes
of
size
at
least
k.
This
makes
reidentification
of
a
specific
individual
by
linking
to
external
data
more
difficult.
formal
approach
to
protecting
individual
privacy
in
microdata
releases.
Since
then,
k-anonymity
has
influenced
practices
in
health
data,
census
data,
and
other
domains
where
sharing
detailed
person-level
data
is
common.
the
same
sensitive
attribute,
and
susceptibility
to
homogeneity
and
background
knowledge
attacks.
Even
with
k-anonymity,
an
attacker
may
infer
sensitive
information
or
membership
in
a
dataset,
especially
when
external
data
sources
are
abundant.
Extensions
such
as
l-diversity
and
t-closeness,
as
well
as
broader
approaches
like
differential
privacy,
address
some
of
these
weaknesses
by
strengthening
how
sensitive
attributes
are
distributed
within
classes
or
by
adding
noise.
Choosing
an
appropriate
k
and
a
suitable
set
of
quasi-identifiers
remains
a
key
design
decision.
See
also
l-diversity,
t-closeness,
and
differential
privacy.