Have you ever received a PDF file that did not contain searchable text? You may know that you can use Acrobat’s OCR (Optical Character Recognition) to add an invisible layer of searchable text on top of the file. This allows you to select, copy and search text on a paper document. Great!
What do you do when you have hundreds of TIFFs and Image-only PDFs file that you need to search for a big case? Working with these documents one at a time is not efficient.
If you have Acrobat Professional, you can batch OCR and let you computer do the work for you.
NOTE: Acrobat 9 and up make this process much easier. Simply select Document>OCR Text Recognition>OCR Multiple Files. If you have Acrobat 9 and you just want to OCR a bunch of files, this is probably all you need! Acrobat X can do OCR as part of an Action, so you can combine OCR with other operations as part of a document processing workflow.
Read on to learn how…
Batch Processing to the Rescue
There are two steps to follow:
- Set up a Batch Sequence
- Run a Batch Sequence
Set up a Batch Sequence
Scan your documents locally or send to a PC where Acrobat Pro is installed.
If you have the capability, scan directly to PDF or to an MTIFF (multi-page TIFF). These formats allow all of the pages of a document to be maintained as a single file.
- In Acrobat Professional 7, choose Advanced—>Batch Processing
— or —
In Acrobat Professional 8, choose Advanced—>Document Processing—>Batch Processing - Click the New Sequence button.

- Give the sequence a name.
- Click Select Commands

- Choose Recognize Text Using OCR and click the Add button.

- Double-click the Recognize Text using OCR text (right side of the window) to set OCR Options.
-Set Downsample Images to 300 dpi. Click OK

- Click OK again to get back to the main window.
- Click Output Options
Note:
Output Options allows you specify where the OCR’d files should be written. I suggest writing them to a local drive and copying later to a network store. - Enable PDF Optimizer and Do not overwrite existing files.
- Click the Settings Button.
Adjust the settings to make the smallest possible files, especially for Black and White (monochrome) files: JBIG2 Lossless is very efficient and preserves the exact appearance of the text.
Consider trying JBIG2 Lossy which causes some visual degradation, but can be up to 70% smaller than JBIG2 Lossless.
- Click OK.
- Give the revised settings a name such as “B&W Lossy”.
Run a Batch Sequence
Now, all you need to do is to run the batch sequence.
- Place all the files you wish to process in a single folder on your hard drive.
- In Acrobat Professional 7, choose Advanced—>Batch Processing
– or –
In Acrobat Professional 8, choose Advanced—>Document Processing—>Batch Processing - Select the sequence to run
- Click OK
- Select the folder to process
- Click the Select button.
- Select the Output Folder
That’s it!
Sit back and enjoy a cup a coffee as Acrobat does the work for you.

I have a question for you regarding batch processing of OCR. i am trying to convert a large group of .pdf files to searchable .pdf. however when i follow the batch processing steps that you outline here [http://blogs.adobe.com/acrolaw/?p=118], the software still makes me hit ok after each document is processed. is there anyway to have it automatically convert all 20,000 files w/o pressing ok 20,000 times?
————
Rick’s Reply
You don’t want to use the instructions from this article since it only applies to PDF Portfolios. Instead, use the instructions here: http://blogs.adobe.com/acrolaw/?p=7
This is great stuff. Do you know of a way to take the same process as described in your article except make it so that any pdf which makes it’s way into a “monitored folder” is automatically added with an OCR layer?
Example might make this clearer. Suppose you had a high speed network scanner dumping files to a shared directory called “\\LAW-SERVER1\SCANS\NEEDS-OCR”.
I am looking for a way to have the software automatically detect there is a new .pdf file in “\\LAW-SERVER1\SCANS\NEEDS-OCR” and than have it OCR’d, and as soon as the OCR process is complete, have it forwarded to “\\LAW-SERVER1\SCANS\ALREADY-BEEN-OCR” (obviously the names are just for illustration).
In any case, if this can be done with acrobat, or acrobat in combination with some other process it would be amazingly useful for us and I am sure many other attorneys who don’t want (or don’t have the resources) to spend thousands on enterprise type scanning software.
—– Rick’s Reply —-
Acrobat doesn’t support hot folders, but you can easily set up a Batch Sequence that will take everything in one folder, OCR it, optionally rename it, optimize it and then place the files in another folder.
What happens when a PDF already has OCR made?
Can I avoid rework?
—————- Rick’s Reply ——————————
That depends what you mean. If the PDF is a “PDF Normal” file, such as one converted directly from Word, Acrobat will not OCR it. Acrobat can OCR the file more than once. In fact, you might want to do that when upgrading a newer version of Acrobat that offers more accurate OCR, like Acrobat X. The exception is ClearScan OCR. Once you use this “flavor” or PDF, you cannot re-OCR the file.
How can I straighten my documents?
—- Rick’s Reply —-
Optimize Scanned PDF
The routine is great, but when I run the sequence I have to keep hitting enter at every file. Can I run this routine without having the system keep asking to hit enter all the time? I would appreciate you comments.
No, sorry. Just hit ENTER to speed it up.
If I have my Sequence created, is there a way to call it from a command-line? I would like to schedule this to run on a scheduled basis on a server rather than for a person to have to start the process.
In addition, if it is possible to run via command-line, can I supply a folder name to search as well as a folder to place completed OCR’d files? That would come in very handy. Thanks for any info!
Acrobat does not support scheduled batching or the command line, sorry.
What affects the speed of searching with OCR’d files. I download large, old 600-page books — not tabular, nothing special, mostly text books. In some, searches for a name take seconds; in others minutes. Is it possible to embed some manner of preprocessed index I could do? In my case, the additional size isn’t particularly an issue.
Yes. See: http://blogs.adobe.com/acrolaw/2007/06/full_text_search_of_pdf_using_ad/
This is AWESOME! Thanks for the help!!!
Is it possible to automatically OCR a document at the time of the creation/conversion when I print to PDF Printer?
When you use an electronic source like Word, Excel, PowerPoint, etc., the files creates are always searchable. IOW, you don’t need to OCR files created from electronic (non-paper) sources.
What is the difference between “optimized scanned PDF” and “PDF Optimizer?” Should I add “optimize scanned PDF” to my batch OCR routine (in addition to PDF Optimizer) and if so, before or after OCR?
Thanks!!
Good question and different use cases. Optimize Scanned PDF applies adaptive compression, creating B&W and color areas and OCRing the document. It is best for color documents. The PDF Optimizer is a general and powerful tool for changing resolution, compression and removing file items.
Great blog/topic, and you/Rick came SO close to addressing my question in the last post. In Adobe 8 Pro, is there a way to run a batch so that the docs are processed via “Optimize Scanned PDF” – i.e. both OCR and optimized/reduced in file size? It’s strange to me that this is not a default setting option in the batch processing dialog box, as I would think this would (and or should) be one of the first things users (and attorneys in particular) would want to do…Thanks a lot in advance.
Yes, you can OCR and then use the PDF Optimizer via a Batch Process in Acrobat 8.
Quick update, I deleted the previous job and re-created, the process works fine now, no confirmation boxes for every document. Once difference was a triple “…” in the OCR options drop down box, although I cannot find any documention of what the triple full stop refers to. Hope this helps someone!
This works in version 9 but what do I do with Acrobat X as the sequence functionality was removed. I used to OCR, optimize and add a watermark in 1 sequence. Now, with Acrobat X, I can not do this
You can do that in Acrobat X using the Action Wizard. It’s tons easier, too!
Is there a way to set up a scheduled task for doing this? As an example, I want scan a set folder every night and have those files OCRed and moved to another. Any ideas for an easy way to do this?
AutoBatch from http://www.evermap.com
Hello, is there a method to schedule batch OCR to start at a specified time i.e. overnight when networks are quiet? We have Pro X. Thanks in advance.
No, however evermap.com has a product called AutoBatch that can.