How to redact a PDF with OCR

Scanned PDFs are images. Redaction needs OCR first to find the text to remove.

You scanned an old contract from paper. The text isn't selectable — it's just pixels. You can't redact 'all instances of John Smith' because the PDF doesn't know there's text on the page. OCR fixes that.

OCR and what it does

Optical Character Recognition analyses image regions and produces a text layer that's searchable, selectable and redactable. After OCR, your scanned PDF has invisible text behind the images that matches what a human would read.

With OCR applied, you can search for 'John Smith' and find every instance. Without OCR, you'd have to redact each page by visually finding the name.

When to OCR before redacting

Any scanned document — receipts, contracts, court filings, medical records — likely lacks a text layer. OCR before attempting text-based redaction.

For born-digital PDFs (Word exports, software outputs), OCR isn't needed — text is already accessible. Check by trying to select text on a page; if you get a selection, OCR isn't needed.

OCR workflow with Flint

Run the scanned PDF through OCR (Flint's edit tool and many other tools include this). Verify the OCR output by searching for known text. Then open Flint's redaction tool and mark the sensitive content for removal.

For large batches, accuracy matters — OCR can mis-recognise characters, missing instances of the text you want to redact. Spot-check pages with key terms after OCR.

OCR accuracy and edge cases

OCR struggles with: low-resolution scans, handwriting, unusual fonts, rotated pages, multiple columns. Each can result in missed text and incomplete redaction.

For litigation-grade redaction of scanned material, a second pass by a human reviewer is essential. Automated redaction after OCR is a first pass, not a final step.

FAQ

Can Flint OCR my PDF?

Yes — Flint's edit tool includes OCR for scanned documents. After OCR, use the redaction tool for sensitive content removal.

What if OCR misreads the text?

Searches and automated redaction will miss the misread instances. Always verify by manual review for high-stakes documents.

Should I redact the image or the text layer?

Both. Proper redaction tools remove the image content at the redaction region *and* the underlying text layer. If only one is removed, the other can leak.

Does OCR change the document?

Yes — it adds a text layer. The visual appearance is preserved; the file size increases slightly. The original image is not altered.

OCR makes scanned PDFs redactable. Run it then redact — and verify both layers are clean.

Try it now

Drop a PDF in and you'll be done in seconds — no install, files private to your account.

More on this

How to Redact a Scanned PDF with OCR | Flint — Flint PDF