Home

Canu

Canu is a free, open-source genome assembler designed for long-read sequencing data, such as those produced by Pacific Biosciences SMRT sequencing and Oxford Nanopore Technologies. It is a continuation of legacy Celera Assembler work and is widely used for de novo genome assembly, including large and complex genomes. Canu focuses on handling the high error rates characteristic of long reads and aims to produce accurate assemblies from single data sets or mixed data.

The Canu workflow consists of three main stages: read error correction, read trimming, and assembly. The software

Canu runs on Unix-like operating systems and is typically deployed on high-performance computing infrastructure due to

Impact and usage: since its introduction, Canu has become a standard tool in de novo genome assembly,

employs
an
overlap-layout-consensus
approach
and
emphasizes
adaptive
k-mer
weighting
to
distinguish
repeats
and
improve
assembly
continuity.
This
design
helps
the
assembler
scale
to
large
genomes
and
to
manage
heterozygosity
and
repeats
that
complicate
assembly.
Polishing
steps
can
be
used
to
improve
consensus
accuracy,
often
by
integrating
short-read
data
or
other
polishing
tools
after
the
initial
assembly.
substantial
memory
and
CPU
requirements,
especially
for
large
eukaryotic
genomes.
Outputs
commonly
include
assembled
contigs
or
unitigs,
along
with
summary
statistics
and
optional
assembly
graphs.
The
pipeline
accepts
various
long-read
chemistries
and
can
be
tuned
to
genome
size,
read
length,
and
expected
error
profiles.
applied
to
bacteria,
fungi,
plants,
and
animals.
It
is
frequently
used
in
combination
with
polishing
steps
and
supplementary
data
to
produce
high-quality
genome
assemblies
and
has
contributed
to
numerous
reference-grade
genomes
in
genomics
research.