typographicalOCR
Typographical OCR is a branch of optical character recognition focused on preserving typographic information in digitized documents. Unlike conventional OCR, which primarily aims to produce plain text, typographical OCR seeks to identify and encode font styles, sizes, weights, spacing, and layout features in addition to the textual content. The goal is to enable faithful rendering, typographic search, and scholarly analysis of typography and publishing history.
The approach combines document image analysis with character recognition and font-aware modeling. Typical steps include preprocessing
Applications include digitization of books and journals, archival preservation of historical typography, bibliographic research, and improving