OrthoMCL
OrthoMCL is a software framework for identifying orthologous gene groups across multiple genomes. It combines sequence similarity data with a clustering approach to group proteins into orthologs and co-orthologs, while accommodating in-paralogs that arise after speciation. The method typically relies on all-vs-all protein comparisons, usually generated by BLASTP, and then applies the Markov Clustering (MCL) algorithm to a graph of protein relationships. OrthoMCL uses reciprocal best hits and normalization steps to reduce biases from gene length and uneven species representation, aiming to separate true orthologs from paralogs.
Workflow and methodology: input consists of protein sequences from several genomes. An all-vs-all similarity search produces
Output and applications: the primary product is a collection of ortholog groups (OGs), each containing one or
Implementation notes: OrthoMCL is distributed as a pipeline of scripts that operate with a relational database