CTCbased
CTCbased refers to models that use Connectionist Temporal Classification (CTC) loss to train neural networks for sequence labeling without requiring frame-level alignments. The method, introduced by Graves and colleagues, is designed for tasks where the input and output sequences have different lengths and the alignment between them is unknown, such as speech, handwriting, or sign-language recognition. CTC-based models typically output at each time step a probability distribution over a set of labels plus a special blank token, which allows the model to emit repeated labels and to skip inputs.
During training, the CTC loss sums over all possible alignments between the input sequence and the target
CTC-based approaches are common in end-to-end speech recognition, handwriting recognition, OCR, lip-reading, and other sequence tasks.
Advantages of CTCbased methods include training with unsegmented data, end-to-end optimization, and relatively simple alignment handling.