Paper— or the digitally scanned equivalent— is still a large component of regulatory filings for many pharmaceutical firms.
Note that the FDA encourages submission of PDF documents created from computer applications instead of scanned PDFs.
The FDA offers this guidance on the CDER site in a PDF document called Portable Document Format Specifications:
Avoid image based PDF files whenever possible. PDF documents created directly from an electronic source such as a word processing file provides many advantages over PDF documents created by scanning paper documents. Scanned documents are more difficult to read and do not allow the reviewer to search or copy and paste text for editing in other documents.
Unfortunately, paper is an unavoidable part of submissions for many firms.
In order to assist agency reviewers, it is a best practice to use Optical Character Recognition (OCR) to create a searchable PDF document from scanned originals.
Background: Making Paper Searchable
Most devices that scan to the PDF format produce an image-only PDF. An image-only PDF contains a picture of a page (scan) in a PDF wrapper— it does not contain searchable text.
Acrobat’s OCR (Optical Character Recognition) feature allows the addition of an invisible layer of searchable text to assist document reviewers:

Acrobat Standard can perform OCR on only one document at a time.
Acrobat Professional, however, can create Batch Sequences which offer OCR automation.
In this article, I offer step-by-step instructions to create a batch sequence that allows for the efficient processing of hundreds or thousands of documents.
Note that this article does not cover every aspect of the FDA’s guidance for PDF creation, but I plan to address additional topics in the future.
Batch Processing to the Rescue
Setting up and using Batch Processing in Acrobat Professional takes only a few clicks. You will need to:
- Set up the Batch Sequence
- Run the Batch Sequence
Once you create the Batch Sequence, it may be reused for additional projects.
Setting up for Best Possible Throughput
OCR speed is dependent on three factors:
- Location of the source and destination files.
Reading from and writing files to a local PC hard drive will greatly improve speed compared to using a network folder. - Processing Speed of the host PC
Modern, faster computers will complete the OCR process faster. For best results, your host PC should have at least 1GB of RAM. - Complexity of the source document
Document Types for Input
Scan your documents locally or send to a PC where Acrobat Pro is installed.
For easiest processing , scan directly to PDF or to an MTIFF (multi-page TIFF). These formats allow all of the pages of a document to be maintained as a single file.
Creating a Batch Sequence
- Open the Batch Sequence window:
In Acrobat Professional 7, choose Advanced—>Batch Processing
In Acrobat Professional 8, choose Advanced—>Document Processing—>Batch ProcessingSuperior OCR Features in Acrobat 8
Acrobat 8 offers significantly more accurate OCR than previous versions.
Acrobat 8 offers a total of 35 different languages including double-byte languages such as Japanese, Korean, and Chinese.
Acrobat always uses English as a secondary OCR language. For example, English words will be properly recognized when mixed in a Japanese language document.
Acrobat 8 allows OCR to work on documents to which PDF headers and footers were added. These would previously cause a "Renderable Text" error.
- Click the New Sequence button.
- Give the sequence a name.
- Click Select Commands

- Next, choose the commands to run on your file set
+
Choose Recognize Text Using OCR and click the Add button.
+ Choose Make Accessible and click the Add buttonMake Accessible adds structure to the PDF which allows for easier content re-use. This feature also offers visually impaired individuals the structure necessary to make best use of their screen reading software. Creation of Accessible Documents should be considered a Best Practice.
- Select Recognize Text using OCR text on right side of the window.
Click the Edit button

- Choose the appropriate language from the Primary OCR Language menu.
You may downample the following documents types to 300 dpi to Reduce File Size
- Handwritten notes(black ink)
- Plotter output graphics
- High pressure liquid chromatography
Click OK again to get back to the main window.
|
Other Available Options Other options in the Edit Batch Sequence window allow you to:
The Output Options button allows you to:
Need to log operations? Choose Edit—>Preferences and enable "Save Warnings and Erros in Log File" |
Run a Batch Sequence
To run the batch sequence:
- Place all the files you wish to process in a single folders on your hard drive.
- In Acrobat Professional 7, choose Advanced—>Batch Processing
– or –
In Acrobat Professional 8, choose Advanced—>Document Processing—>Batch Processing - Select the sequence to run
- Click OK
- Select the folder to process
- Click the Select button.
- Select the Output Folder
That’s it! Sit back and enjoy a cup a coffee as Acrobat does the work for you

Is it possible to remove OCR from a file once it has been saved?
Hold on. You cannot even get up to get a cup of coffee. ‘Warnings and errors’ will keep on popping up on almost every other file so batch processing pauses every time. You can de-select ‘show warnings and errors’ in Preference but it does not work. Batch processing is as slow as manually doing it one by one.
And no disable ‘show warnings and errors’ in preference does not work!