OCRnoise
OCRnoise is a term used to describe the degradation and distortions in document images or recognition processes that reduce the accuracy of optical character recognition (OCR) systems. It encompasses image-level noise, capture artifacts, and processing artifacts that hinder reliable text extraction.
Common sources include scanner and camera noise, motion blur, uneven illumination, paper texture, ink bleed, stains,
OCRnoise increases character, word, and line recognition errors, leading to higher character error rate (CER) and
Metrics used to assess impact include CER and WER. Image quality metrics such as PSNR or SSIM
Mitigation includes preprocessing and denoising (median, Wiener, non-local means), adaptive thresholding, contrast enhancement, deblurring, and deskewing.
Understanding OCRnoise aids digitization projects, archival preservation, and tool benchmarking, helping engineers select preprocessing pipelines and