GeneMark
GeneMark is a family of computational tools used for ab initio gene prediction in DNA sequences. Developed to identify protein-coding genes, the software has been applied to a range of genomes, from bacteria and archaea to more complex eukaryotes and metagenomic data. The programs rely on probabilistic models, primarily Markov chains and hidden Markov models, to distinguish coding regions from non-coding sequence and to model gene structure such as start and stop signals and, in eukaryotes, exon–intron boundaries.
Over time, GeneMark has expanded into several variants designed for different data types and genome complexities.
GeneMark has had a broad impact on genome annotation workflows and is frequently cited in genome projects