Adobe is the custodian for both PDF and TIFF (Tagged Image File Format) formats.
While PDF is superior in many ways, TIFF remains a popular format for use in large case litigation support systems such as Concordance and Summation.
If you have a lot of PDFs in your production it can be a challenge to work with these systems as they do not robustly support PDF and conversion is necessary. These systems want to ingest a . . .
- TIFF file to represent each individual document page
- TEXT file of the text of each page
Processing several hundred documents to individual TEXT and TIFF files is a candidate for some serious automation!
Fortunately, repetitive tasks like this can be easily accomplished using Acrobat Professional. Since Acrobat can be automated using JavaScript, it is possible to string together several steps and save a lot of time.
In this article, I’ve included a Tiff-Text Processing Batch Script to download which handles all of this conversion automatically. Here are the results:
What the script do exactly?
The TIFF-TEXT Processing script performs the following steps:
- Tags the file for accessibility and text reflow.
This should make the text files easier to review in litigation support products. - Splits the PDF into individual PDFs by page
- Exports the individual PDFs as TIFF
- Exports the individual PDFs as Text files
Have you OCRd your files first? Acrobat can’t export text if the file hasn’t been OCRd first. Check out this article on Batch OCR . |
What’s Covered . . .
- Installing the script
- Setting related Acrobat Preferences
- Tweaking the Script
- Running the Script
- Troubleshooting
Download the Sequence File
Below is a PDF file containing the sequence file. Select the file in the Attachments panel of the PDF and click the Save button to extract it. .
Installation_instructions (52K) (Opens in a new window)
Installing the Sequence File for Acrobat 8
The instructions below have been tested with Acrobat 8.
Caution |
The above obligatory warning aside, it seems to work.
- Quit Acrobat if it is open.
- Extract the sequence file contained in the Installation_instructions PDF to your desktop or other location you can find easily.
The sequence file is called Tiff-Text Processor.sequ - Select the Tiff-Text Processor.sequ file, right-click and choose Copy to place the file on the clipboard
- Place the file in the following location:
WINDOWS
C:\Documents and Settings\<username>\ApplicationĀ Data\Adobe\Acrobat\8.0\SequencesMAC OSX PPC
/Users/<username>/Library/Acrobat User Data/8.0_ppc/SequencesMAC OSX INTEL
/Users/<username>/Library/Acrobat User Data/8.0_x86/Sequences - Restart Acrobat
Note |
Can’t see Files on Windows?
|
Set TIFF Conversion Preferences
The majority of case documents may be represented well as B&W TIFFS at 300 dpi resolution. Acrobat’s default preference, however, is to make an intelligent conversion of the document which could result in the creation of grayscale or color TIFFs . . . these can be really large!
Let’s make some changes:
- Choose Edit—>Preferences . . .
(Acrobat—>Preferences . . . on the Mac) - In the Preferences window
A) Choose Convert from PDF
B) Choose TIFF
C) Click the Edit Settings button - Make the following changes in the Settings Window:
A)Change Colorspace to Monochrome
B) Change Resolution to 300/pixels/inch
Click OK - Set Batch Conversion Preferences.
A) Click on the Batch Processing category (far left)
B) Enable "Save warnings and errors in log file"
Click OKSometimes a file may not convert properly. You can view a log file created by Acrobat to help with troubleshooting.
Destination Locations
One thing you should know about the script— the destination folder is hard wired.
If you run the script sample as-is, it will prompt you to find the PDFs to process and then write individual PDFs, TIFFs and Text files into a folder at C:\dest.
Windows:
At the very least, you will need to create the "dest" folder at the root level of your C drive to use the script.
Mac:
On my Mac, I was surprised that Acrobat actually created a folder at /C/dest.
Still, you probably will want to have more control about where the files will go. See below.
Changing the Destination Location
To change the place where files will be written:
- Advanced—>Document Processing—>Batch Processing . . .
A) Scroll down to find the Tiff-Text Processor sequence
B) Click Edit Sequence . . . - Click the Select Commands . . . button in the Edit Batch Sequence window
- In the Edit Sequence window:
A)Choose Execute JavaScript from the list on the right and
B)Click the Edit button. - In the JavaScript editor window, scroll down to find the line:
var destPath = "c/dest/"
What’s in a path?
/c/dest/ represents the drive letter and path. If you want to put the transformed files files into a path on your desktop, you might change that portion to:Windows
/c/Documents and Settings/USERNAME/Desktop/FOLDERNAMEMacintosh
/Users//USERNAME/Desktop/FOLDERNAME
Source File Location
It isn’t always convenient to have to select the file(s) for conversion, especially if your document production spans several nested folders.
To process a folder and all subfolders within:
- Advanced—>Document Processing—>Batch Processing . . .
A) Scroll down to find the Tiff-Text Processor sequence
B) Click Edit Sequence . . . - In the Edit Batch Sequence window, change the following:
A) Change Run commands on to "Selected Folder"
B) Click the Browse button and locate your source folder.
Click OK.Warning!
Do not change the Output location via the window above or the script will not work. Leave this as "Ask When Sequence is Run".
Running the Sequence
This part is easy!
- Advanced—>Document Processing—>Batch Processing . . .
A) Scroll down to find the Tiff-Text Processor sequence
B) Click the Run Sequence button - Acrobat will prompt you to select files if you did not change the Source File location.
- Acrobat will display the Run Confirmation window.
You can turn this off in Preferences(Batch Processing category). - Acrobat will process the files. This could take a while!
- Open your destination folder to view the results:
Troubleshooting
The script is not perfect. I have found that it does not work properly in the following cases:
- If no text is created from a file, ensure that it has been OCRd.
- You might receive messages that files could not be tagged or were already tagged. Generally, you can ignore these.
- Not all PDF forms can be saved as TIFF. You might need to flatten them first using the PDF Optimizer
- Corrupted PDFs may cause a crash
I need to convert Word, Excel, etc. to PDF
You can use Batch Processing to convert any type of file supported by Acrobat to PDF. You would need to conduct this operation as a separate batch sequence before running the TIFF-Text Processor script. Regrettably, Acrobat doesn’t allow you to chain together PDF Creation and secondary processing.
How do I learn more about scripting?
AcrobatUsers.Com Javascript Corner
Thank you, Leonard! Thanks to Leonard Rosenthol, Adobe’s Technical Standards Evangelist, for his help in developing this script. I don’t know anybody who knows more about the technical intricacies of PDF than Leonard! |