Legal
The views expressed in this blog are my own and do not necessarily reflect the views of Adobe Systems Incorporated.
Search
May 17, 2008
Generating TIFF and Text files from PDF for Concordance and Summation
Adobe is the custodian for both PDF and TIFF (Tagged Image File Format) formats.
While PDF is superior in many ways, TIFF remains a popular format for use in large case litigation support systems such as Concordance and Summation.
If you have a lot of PDFs in your production it can be a challenge to work with these systems as they do not robustly support PDF and conversion is necessary. These systems want to ingest a . . .
- TIFF file to represent each individual document page
- TEXT file of the text of each page
Processing several hundred documents to individual TEXT and TIFF files is a candidate for some serious automation!
Fortunately, repetitive tasks like this can be easily accomplished using Acrobat Professional. Since Acrobat can be automated using JavaScript, it is possible to string together several steps and save a lot of time.
In this article, I've included a Tiff-Text Processing Batch Script to download which handles all of this conversion automatically. Here are the results:
October 27, 2005
Batch OCR using Acrobat Professional
Have you ever received a PDF file that did not contain searchable text? You may know that you can use Acrobat’s OCR (Optical Character Recognition) to add an invisible layer of searchable text on top of the file. This allows you to select, copy and search text on a paper document. Great!
What do you do when you have hundreds of TIFFs and Image-only PDFs file that you need to search for a big case? Working with these documents one at a time is not efficient.
If you have Acrobat Professional, you can batch OCR and let you computer do the work for you.
Read on to learn how…