OCRopus
OCRopus is an open-source optical character recognition (OCR) system and document analysis framework. It is designed as a modular toolkit for building end-to-end OCR pipelines and as a research platform for experimenting with layout analysis, text-line recognition, and post-processing.
The system structures processing as a sequence of interchangeable components. A page layout analysis module identifies
Recognition in OCRopus typically relies on machine learning approaches to map image features to character sequences.
History and status notes: OCRopus originated as an open-source project developed and released by researchers associated