Home

reidentification

Reidentification, or re-identification, is the process of determining the identity of individuals from data that has been de-identified or anonymized. It often occurs when anonymized data are linked with other data sources containing identifying information, enabling the inference of who is represented in the data.

Risk factors include the presence of quasi-identifiers—data fields that are not unique on their own but can

Historical cases have highlighted the risk. In 1997, Latanya Sweeney showed that Massachusetts voter registration data

Mitigation approaches include reducing identifiability (data minimization), applying k-anonymity or l-diversity, and employing differential privacy. More

Reidentification remains a central concern in data governance and privacy law. Regulations such as the GDPR

become
identifying
when
combined
(for
example,
birth
year,
ZIP
code,
and
gender).
The
increasing
availability
of
public
and
commercial
datasets
raises
the
likelihood
of
successful
reidentification
through
data
linkage
and
background
information.
could
be
linked
with
a
public
governor's
name,
effectively
deanonymizing
medical
records
when
used
with
other
attributes.
The
2007
Netflix
Prize
released
anonymized
viewing
data
that
researchers
linked
to
IMDb
profiles
to
identify
some
users,
illustrating
the
vulnerability
of
ostensibly
anonymized
datasets.
The
2006
AOL
query
log
release
similarly
exposed
user
identities
through
search
queries
and
timestamps.
robust
strategies
rely
on
synthetic
data,
secure
data
enclaves,
and
strict
access
controls.
Privacy-by-design
and
regular
risk
assessment
are
recommended
to
balance
data
utility
with
protection.
and
HIPAA
establish
standards
for
de-identification
and
govern
how
data
can
be
shared,
while
institutions
implement
technical
and
organizational
safeguards
to
limit
reidentification
risk.