You scan a contract with your phone. The PDF looks fine — but try to copy any text and you get nothing. That's because it's an image of text, not text itself. OCR fixes that. Here's how.
Step one: finding the words
OCR (Optical Character Recognition) starts by analysing the image for patterns that look like text — straight edges, regular spacing, line shapes. It separates text regions from images, tables, and whitespace. Modern OCR uses machine learning trained on millions of documents, which is why it handles unusual layouts (columns, footnotes, captions) far better than older versions did.
Step two: identifying each character
For each detected glyph, the engine matches against learned letter shapes. Context matters: 'rn' is reconsidered as 'm' if neighbouring words suggest it, 'l' becomes '1' inside numbers. Confidence scores attach to each guess. Words with low confidence often get flagged for review. The output is real, selectable text layered invisibly underneath the original image — so the page looks unchanged but is now searchable and copyable. Open a scanned PDF in Flint to see this in action.
What affects accuracy
Clean scans of typed text in common fonts hit 99%+ accuracy. Faded photocopies, skewed angles, handwriting, and unusual languages drop sharply. Quick fixes: scan at 300 DPI in good light, keep pages flat, and skip colour mode for black-and-white originals. If results are messy, re-scan rather than fight the OCR.
FAQ
Does OCR change the file's appearance?
No — it adds an invisible text layer beneath the existing image. The PDF still looks identical; you just gain selectable, searchable, and copyable text underneath.
Can OCR read multiple languages?
Yes, but you usually need to tell it which language to expect. Mixing languages on one page (English + Chinese) can confuse engines unless they're explicitly multi-lingual.
Is the OCR text the same as retyping?
Functionally yes, but with possible character errors. Always proofread OCR'd text before relying on it for legal or financial purposes.
OCR turns picture-PDFs into searchable, copyable documents. Open your scan in Flint and let the words come back to life.