Home

GFF3

GFF3, or General Feature Format Version 3, is a text-based file format used to describe genomic features and their coordinates on reference sequences. It is widely used in genome annotation pipelines because it provides a simple, machine-readable representation and supports hierarchical relationships among features such as genes, transcripts, and exons.

A GFF3 file is tab-delimited and comprises lines that describe individual features. Each data line contains

The final column, attributes, is a semicolon-separated list of key=value pairs. Common keys include ID, Name,

GFF3 is designed to be flexible and interoperable. The Type field relies on a controlled vocabulary, and

Usage notes: many tools can parse GFF3, including libraries in Biopython, BioPerl, and others; ensure consistent

nine
columns:
seqid,
source,
type,
start,
end,
score,
strand,
phase,
and
attributes.
Lines
starting
with
a
hash
(#)
are
comments
or
directives;
a
line
containing
##FASTA
indicates
that
following
lines
contain
sequence
data
in
FASTA
format.
Coordinates
are
1-based
and
inclusive;
the
type
field
generally
uses
terms
from
the
Sequence
Ontology.
and
Parent,
with
Parent
used
to
express
hierarchical
relationships
(for
example
a
gene
has
an
mRNA
child,
which
in
turn
has
exon
children).
The
ID
value
must
uniquely
identify
the
feature,
while
the
Name
provides
a
human-readable
label.
the
hierarchy
created
by
Parent
relationships
enables
complex
models
of
gene
structure.
It
is
compatible
with
genome
browsers
and
annotation
pipelines,
and
is
often
contrasted
with
the
GTF
format.
sequence
identifiers
across
files
and
include
a
proper
FASTA
section
if
provided.
A
simple
GFF3
example
would
describe
a
gene
feature
with
an
ID,
a
transcript
with
a
Parent
pointing
to
the
gene,
and
exons
with
Parent
set
to
the
transcript.