Home

CTCbased

CTCbased refers to models that use Connectionist Temporal Classification (CTC) loss to train neural networks for sequence labeling without requiring frame-level alignments. The method, introduced by Graves and colleagues, is designed for tasks where the input and output sequences have different lengths and the alignment between them is unknown, such as speech, handwriting, or sign-language recognition. CTC-based models typically output at each time step a probability distribution over a set of labels plus a special blank token, which allows the model to emit repeated labels and to skip inputs.

During training, the CTC loss sums over all possible alignments between the input sequence and the target

CTC-based approaches are common in end-to-end speech recognition, handwriting recognition, OCR, lip-reading, and other sequence tasks.

Advantages of CTCbased methods include training with unsegmented data, end-to-end optimization, and relatively simple alignment handling.

label
sequence,
effectively
marginalizing
the
alignments.
The
blank
symbol
enables
the
model
to
represent
non-emitting
time
steps
and
to
compress
repetitions.
In
decoding,
practitioners
often
use
greedy
decoding
or
beam
search,
sometimes
combined
with
an
external
language
model
to
improve
fluency
and
accuracy.
They
are
typically
built
with
recurrent
neural
networks
such
as
LSTMs
or
GRUs,
but
can
also
be
combined
with
convolutional
networks
or
modern
transformer
architectures.
They
enable
real-time
streaming
models
when
implemented
with
suitable
architectures.
Limitations
include
an
implicit
independence
assumption
between
output
labels
and
potential
difficulty
modeling
strong
dependencies
within
a
label
sequence;
decoding
may
require
external
language
models
for
best
results.
Alternatives
include
RNN-Transducer
and
attention-based
models
that
capture
longer-range
dependencies
differently.