Posts in Category "OCR"

Batch OCR Multiple Files Easily

One requirement for scanned PDFs submitted to most regulatory agencies is that the files are searchable. To make scanned PDFs text searchable, Optical Character Recognition (OCR) is used.

In a previous article on Batch OCR, I discussed using Acrobat Pro to create a Batch Sequence to process multiple files.

In Acrobat 9 Standard and Pro, there is now an easy way to OCR multiple files using a new feature— Recognize Text in Multiple Files using OCR:

OCR Multiple Window

Read on to learn how to use this new feature.

Continue reading…

Batch OCR using Acrobat Professional

,,,

Paper— or the digitally scanned equivalent— is still a large component of regulatory filings for many pharmaceutical firms.

Note that the FDA encourages submission of PDF documents created from computer applications instead of scanned PDFs.

The FDA offers this guidance on the CDER site in a PDF document called Portable Document Format Specifications:

Avoid image based PDF files whenever possible. PDF documents created directly from an electronic source such as a word processing file provides many advantages over PDF documents created by scanning paper documents. Scanned documents are more difficult to read and do not allow the reviewer to search or copy and paste text for editing in other documents.

Unfortunately, paper is an unavoidable part of submissions for many firms.

In order to assist agency reviewers, it is a best practice to use Optical Character Recognition (OCR) to create a searchable PDF document from scanned originals.

Background: Making Paper Searchable

Most devices that scan to the PDF format produce an image-only PDF. An image-only PDF contains a picture of a page (scan) in a PDF wrapper— it does not contain searchable text.

Acrobat’s OCR (Optical Character Recognition) feature allows the addition of an invisible layer of searchable text to assist document reviewers:

Acrobat offers OCR via a menu selection

Acrobat Standard can perform OCR on only one document at a time.

Acrobat Professional, however, can create Batch Sequences which offer OCR automation.

In this article, I offer step-by-step instructions to create a batch sequence that allows for the efficient processing of hundreds or thousands of documents.

Note that this article does not cover every aspect of the FDA’s guidance for PDF creation, but I plan to address additional topics in the future.

Continue reading…