Someone hands you a scanned PDF — a contract, a textbook chapter, an old report — and asks for an editable version. You open it, try to select text, and the cursor turns into a useless selection rectangle. It's not a PDF in any useful sense. It's a stack of photos.
That's where OCR comes in.
What OCR actually does
Optical Character Recognition looks at the pixels in a scanned page and tries to recognise letters. The output is real, selectable, editable text. Modern OCR — what Flint's PDF to Word converter uses — handles clean scans at 95–99% accuracy on standard fonts.
It's not magic. Smudges, skewed scans, exotic fonts, handwriting and faxes all drag accuracy down. Set expectations accordingly.
Get the cleanest scan you can
If you're scanning the document yourself, set the scanner to 300dpi, greyscale (or colour if the document has colour-coded sections), and keep the pages flat. Avoid the temptation to scan at 600dpi — bigger files, marginal accuracy gain. If you're scanning with a phone, use a dedicated document scanner app that flattens perspective. Crooked scans confuse OCR engines.
Convert and proofread
Drop the scanned PDF into PDF to Word. OCR runs automatically — no toggle to find. Open the docx and read it carefully. Common errors: rn read as m, cl read as d, the number 1 read as the letter l. Find-and-replace handles most of them in bulk. Budget time proportional to how mission-critical the document is.
Languages, layouts, and the awkward cases
Multi-column layouts (newspapers, academic journals) sometimes get read in the wrong order — left column first all the way down, then right column. Re-flow manually in Word. Non-Latin scripts (Arabic, Chinese, Cyrillic) need a converter with multi-language OCR; clean output isn't guaranteed. Handwriting recognition is improving but still unreliable for anything beyond block capitals.
FAQ
How accurate is OCR really?
On clean printed pages, 95–99%. On faxes or skewed scans, 80–90%. On handwriting, anywhere from 50% to unusable. Always proofread important documents.
Do I need to flag it as scanned?
No — Flint detects whether the PDF has selectable text and runs OCR automatically when it's image-only.
What about scans with handwritten notes?
The printed text will OCR fine. Handwritten annotations usually come across as images rather than editable text. Re-type any annotations you need editable.
Will tables work?
Yes for clearly ruled tables. Borderless tables in scans are tricky — the converter relies on spacing alone. Expect to fix a few.
Scanned doesn't mean stuck. Convert your scanned PDF to Word and get editable text back.