Home

GeneMark

GeneMark is a family of computational tools used for ab initio gene prediction in DNA sequences. Developed to identify protein-coding genes, the software has been applied to a range of genomes, from bacteria and archaea to more complex eukaryotes and metagenomic data. The programs rely on probabilistic models, primarily Markov chains and hidden Markov models, to distinguish coding regions from non-coding sequence and to model gene structure such as start and stop signals and, in eukaryotes, exon–intron boundaries.

Over time, GeneMark has expanded into several variants designed for different data types and genome complexities.

GeneMark has had a broad impact on genome annotation workflows and is frequently cited in genome projects

GeneMark.hmm
uses
hidden
Markov
models
to
capture
gene
structure
and
sequence
signals
in
prokaryotic
genomes
and
some
eukaryotes.
GeneMarkS
introduced
self-training
capabilities,
enabling
gene
prediction
without
a
manually
curated
training
set,
which
improved
applicability
to
newly
sequenced
organisms.
GeneMarkS-2
further
enhances
self-training
and
performance
across
a
wider
range
of
eukaryotes
and
metagenomes.
GeneMark-ET
combines
ab
initio
prediction
with
external
hints
from
RNA
sequencing
data
to
improve
accuracy
in
eukaryotic
genomes.
These
variants
are
often
used
within
annotation
pipelines
to
provide
initial
gene
models
that
can
be
refined
by
other
tools.
and
comparative
studies.
It
is
distributed
as
standalone
software
and
available
through
various
public
and
institutional
resources,
sometimes
alongside
web-based
interfaces.
The
name
reflects
its
core
use
of
Markov
models
to
delineate
coding
regions
and
gene
structure
within
genomic
sequences.