Home

UniParc

UniParc, short for Universal Protein Archive, is a comprehensive, non-redundant repository of protein sequences maintained by the UniProt Consortium. It aggregates protein sequences from multiple public data sources and stores each unique protein sequence once, providing a complete historical archive of deposited sequences and their provenance. The primary goal is to enable cross-database integration, traceability of sequence changes, and non-redundant data resources for large-scale analyses.

In UniParc, each unique amino acid sequence has a single record. Sequences reported by different sources that

Access and interoperability: UniParc is accessible through the UniProt website and related data services, including FTP.

Relationship to UniProt: UniParc is part of the UniProt ecosystem and underpins other UniProt resources by

History and maintenance: UniParc was developed to consolidate public protein sequences and is regularly updated to

are
identical
are
merged
into
one
UniParc
entry,
which
includes
the
deposition
date,
source
information,
and
cross-references
to
the
originating
records.
The
database
preserves
historical
versions,
allowing
researchers
to
track
how
a
sequence
has
appeared
across
submissions
and
over
time.
It
provides
cross-references
to
entries
in
major
resources
such
as
UniProtKB,
RefSeq,
and
Ensembl,
enabling
researchers
to
map
sequences
across
databases
and
to
construct
non-redundant
datasets
suitable
for
comparative
and
evolutionary
studies.
supplying
a
stable,
non-redundant
backbone
of
protein
sequences.
It
does
not
provide
functional
or
structural
annotations
itself,
focusing
instead
on
sequence
data,
provenance,
and
historical
records.
reflect
new
submissions
and
revisions.
It
complements
curated
databases
such
as
UniProtKB/Swiss-Prot
by
providing
a
non-redundant,
provenance-rich
sequence
archive.