An old contractor sends a scanned PDF of last quarter's expenses. You need it in Excel. You can't select the text — it's a stack of images masquerading as a document.
This is OCR territory.
What OCR brings to the party
Optical Character Recognition reads pixels and produces text. Modern engines — what Flint's PDF to Excel uses — handle clean printed tables at 95–99% accuracy. Numbers, dates, currency symbols all come through.
Faxes, photographs of paper, and crumpled scans drop accuracy. So does anything where the scan is rotated or skewed.
Numbers deserve extra scrutiny
Of all the things OCR reads, numbers are where small mistakes hurt most. 1 read as 7. 0 read as 8 or O. A misread decimal point shifts the value by a factor of ten. After conversion, run SUM on each numeric column and compare to the totals printed on the original. If they match, you're done. If not, scan visually for outliers and correct.
Get the cleanest scan you can
If the document is being re-scanned: 300dpi, greyscale, scanner glass flat, no shadows. If it's already scanned and you can't re-do it, accept what you've got and budget extra proofreading time. Photos of documents are the worst case — use a proper document scanner app rather than the default Camera if you have the option.
Tables specifically
OCR on tables also has to detect the table structure, not just the characters. Ruled tables (with visible borders) convert well. Borderless tables in scans are the hardest case — column boundaries are inferred from spacing alone. Expect to fix a few merged or split columns by hand. For the cleanest result, request the source document when you can.
FAQ
How accurate is OCR on numbers?
95–99% on clean printed numbers, lower on photos or faxes. Always verify column totals match the original.
Does Flint detect when OCR is needed?
Yes — if the PDF has no selectable text, OCR runs automatically. No flag to set.
Can OCR read handwritten figures?
Block-printed digits sometimes. Cursive numbers, rarely with usable accuracy. Re-type handwritten cells.
What about red/coloured ink?
Modern OCR handles colour, but high contrast (black on white) is most reliable. If you have a choice, scan as greyscale.
Old scans, new spreadsheet. Convert your scanned PDF to Excel and verify the totals.