Home

FASTAFormat

FASTAFormat refers to the FASTA format, a widely used text-based standard for representing nucleotide or protein sequences in bioinformatics. In FASTA files, each sequence entry begins with a header line that starts with the greater-than symbol (>), followed by an identifier and optional description. The lines that follow contain the sequence data, typically wrapped for readability, using standard one-letter codes for nucleotides (A, C, G, T, U, N and other IUPAC ambiguity codes) or amino acids.

Structure and conventions

A FASTA entry consists of a single header line and one or more lines of sequence data.

Common usage

FASTA is a de facto standard for sequence storage and exchange. It is compatible with many database

History and variants

The format emerged with early sequence analysis tools in the 1980s and has remained widely adopted due

Example

>seq1 Homo sapiens example

ATGCGTACGTTAGC

GCTACGATCGATCG

The
header
provides
a
unique
identifier
and
may
include
descriptive
metadata
such
as
organism,
gene
name,
or
accession
numbers.
The
actual
sequence
is
a
string
of
letters
without
spaces,
usually
uppercase,
though
many
tools
accept
lowercase
as
well.
Multiple
entries
can
appear
in
one
file,
each
starting
with
its
own
header
line.
There
is
no
formal
schema
for
the
header
beyond
convention,
so
fields
are
often
separated
by
spaces,
pipes,
or
other
delimiters.
submissions,
sequence
searching
tools
(such
as
BLAST),
and
multiple
sequence
alignment
programs.
Its
simple
structure
makes
it
easy
to
parse
programmatically,
but
it
provides
minimal
metadata
beyond
what
is
included
in
the
header.
to
its
simplicity
and
readability.
While
primarily
described
as
FASTA,
some
references
use
the
term
FASTA
format
interchangeably.
There
are
no
formalized
metadata
fields
within
the
format;
users
often
encode
additional
information
in
the
header
text.