Home

seqname

Seqname is a label used to identify a sequence in biological data sets and related file formats. It typically refers to the name of the sequence’s source, such as a chromosome, a scaffold, a mitochondrial genome, a plasmid, or a transcript, and serves as the primary reference key for the sequence within a dataset.

In common data formats, the seqname appears in different positions. In FASTA and FASTQ, the header line

Naming conventions for seqnames emphasize stability and uniqueness within a dataset. They should avoid spaces and

Practical use of seqnames includes filtering, aggregating, and indexing data by sequence name. Analyses often group

See also: sequence identifier, accession number, locus, chromosome naming, reference sequence.

begins
with
a
greater-than
symbol
followed
by
a
seqname
and
an
optional
description;
the
seqname
is
the
token
used
to
refer
to
the
sequence
in
subsequent
analyses.
In
GFF
and
GFF3
files,
the
first
column
is
the
seqname
(also
called
seqid),
indicating
the
sequence
on
which
a
feature
is
located.
In
many
analysis
pipelines,
the
seqname
corresponds
to
the
reference
sequence’s
name
in
the
alignment
or
assembly.
Other
formats
may
use
similar
terms
such
as
chromosome,
contig,
or
reference
ID
to
denote
the
same
concept.
unusual
characters,
and
often
use
prefixes
such
as
chr
or
accession-like
identifiers
to
reflect
the
source.
When
combining
data
from
multiple
sources,
it
is
common
practice
to
canonicalize
seqnames
to
a
consistent
naming
scheme
to
prevent
mismatches.
features
by
seqname
to
produce
per-sequence
summaries,
extract
per-sequence
subsets,
or
join
annotation
with
sequence
data.