How to Get All the Text Out of a PDF in Plain Format

Get plain text out of a PDF for translation, processing, or reuse — works on scanned PDFs too via OCR.

3 min readGet text out

You need the raw words from a PDF — for a translator, a content analysis script, or to paste into a different tool. No formatting, no images, just text.

Live-text PDFs cough up text easily. Scanned PDFs need OCR first. The tools differ slightly.

For live-text PDFs

Open the PDF in any reader. Cmd/Ctrl+A to select all, copy, paste into a text editor. Done.

For more controlled extraction, use Flint's PDF to Word converter and save the Word doc as plain text. Useful when you want to preserve paragraph breaks.

For scanned PDFs

Select-all returns nothing because there's no text layer. Run OCR first. After OCR, the text becomes selectable.

OCR quality varies. Clean scans: 90-95% accurate. Messy scans: 70-85%. Always spot-check the output.

For PDFs with mixed content

Some PDFs mix live text and scanned images. The live text extracts cleanly; the scanned bits need OCR. Run OCR on the whole document to normalise it.

Result: one extraction pass gets everything.

FAQ

Will formatting come along?

Plain text is just words. Formatting is lost. Use PDF to Word for formatted output.

What about column layouts?

Most extractors read left-to-right within columns and top-to-bottom across columns. Multi-column PDFs sometimes need cleanup.

Can I extract text from a password-protected PDF?

Unlock first, then extract. The password is required.

Are tables extracted as text or structure?

Plain text extraction flattens tables to text. For structured output, use PDF to Excel.

Text extraction is the start of so many other workflows. Use Flint's PDF to Word converter for clean output, OCR first if it's a scan.

Try it now

Drop a PDF in and you'll be done in seconds — no install, files private to your account.

More on this

How to Get Text Out of a PDF | Flint — Flint PDF