by

Why and How to Make Your PDFs Searchable

An annotated PDF in the iAnnotate app on an iPadAs I noted last week, PDF is my preferred file format for document sharing, for a number of reasons. Not all PDFs are created equally, though. I’ve found that it’s really important for files to be run through OCR (Optical Character Recognition).

Why? There are two main reasons, in my experience:

  1. Searchability. Kathleen wrote about this several years ago, in “OCR Those PDFs.” Increasingly, I find myself working with journal articles and other documents in digital format, and I need to be able to search those files. The ability to search also, as Kathleen notes, makes it far easier to annotate PDFs on a tablet or computer.

  2. Accessiblity. If I’m sharing PDFs with other colleagues or students, I want them to be able to use them — and not everyone will be able to use a PDF that’s just an image file. Screen reading software won’t be able to read text from it. If I want everyone I’m working with to be able to use those PDFs, I need to be sure they’ve been OCRed.

Fortunately, most of the PDF files I encounter already have been. But what if I encounter one that hasn’t, or I’m creating a document to share as a PDF?

In the post linked above, Kathleen offers some good suggestions. Mac users might also consider using an app such as PDFScanner. Though it’s primarily a scanning app, it also allows users to import an existing PDF and run it through OCR. It’s very affordable ($15, the last time I checked), and works well. (If readers are aware of anything similar for Windows or Linux users, please add suggestions in the comments below!)

When creating a PDF document, making sure it’s a file that’s readable rather than just an image is easy. Those who like to write in Markdown will find that the PDFs Pandoc creates from their Markdown files are readable. Those who prefer more traditional office applications for document creation can create a readable PDF by choosing that file format under the File — Save As menu in Microsoft Office; in LibreOffice the option is under File — Export as PDF. I expect other office suites have similar functions.

CC-licensed photo by Flickr user Morten Oddvik

Return to Top