FASTAFormat

FASTAFormat refers to the FASTA format, a widely used text-based standard for representing nucleotide or protein sequences in bioinformatics. In FASTA files, each sequence entry begins with a header line that starts with the greater-than symbol (>), followed by an identifier and optional description. The lines that follow contain the sequence data, typically wrapped for readability, using standard one-letter codes for nucleotides (A, C, G, T, U, N and other IUPAC ambiguity codes) or amino acids.

Structure and conventions

A FASTA entry consists of a single header line and one or more lines of sequence data.

FASTA is a de facto standard for sequence storage and exchange. It is compatible with many database

History and variants

The format emerged with early sequence analysis tools in the 1980s and has remained widely adopted due

>seq1 Homo sapiens example

a

a

programmatically,

interchangeably.