Troubleshooting Acrobat OCR

Searchable PDFs are critical in litigation and matter management. Using Acrobat’s OCR function, you can turn mountains of paper into searchable PDFs that look just like the original.

Occasionally, you may run into some issues.

Read on to learn about some workarounds and key considerations.

Acrobat OCR Troubleshooting

Acrobat OCR generally works well, but occasionally you might run into the following problems:

1) Slow Processing

Solutions:

Read and Write Locally
Make sure your source files and OCR’s files are written to local volumes. Reading and writing to the network or from a CD or DVD is much slower. If you are short on space, try using an external USB 2.0 drive.

Input Resolution
Have you scanned above 300 dpi? 600 dpi? There are diminishing returns on OCR accuracy above 600 dpi.

Output Resolution
I generally recommend that you downsample after OCR. For example, scanning at 600 dpi yields slightly better accuracy than scanning at 300 dpi, but downsampling back to 300 dpi to make a smaller PDF can add 20% or more to your conversion times.

Did you scan in color or greyscale?
Scanning B&W documents in color mode results in dramatically bigger files. Acrobat cannot convert color documents to black and white. (Adobe Photoshop can and in batch if you need to). An image-only, black and white, letter-sized document should almost never be more than a 50K PDF if properly compressed. If your PDFs are a lot bigger than this, check your scanning settings.

Large PDFs

Solutions:

Scan in Black and White
Make sure you do not scan in color to limit the size of your PDFs.

Use the PDF Optimizer in Acrobat Professional
Taked advantage of JBIG2 Lossy compression to create PDFs that are smaller. Most incoming PDF Image-only files use CCIT Group 4 Fax compression. This compression flavor was designed for fax machines with limited processing power. It was great technology . . . In 1980. Choose Advanced—>PDF Optimizer.

Use Optimize Scanned PDF in Acrobat Standard
This new feature of Acrobat 8 makes it easy to reduce the file size of scanned images. This feature can also deskew scanned pages and remove dirt, etc. Choose Document —>Optimize Scanned Document

Optimize Scanned Image
Scan to Size
If you scanner supports it, choose automatic page size if you regularly scan documents smaller than 8.5 by 11. Remember that PDF documents can support multiple page sizes. Scanning a business card at letter-size makes a larger file.

Slow Scaning

Solutions:

Buy a Faster Scanner
If you are using a scanner that is more than three years old, it may be time to upgrade. Newer units are dramatically faster. Consider buying a dedicated document scanner. I like the Fujitsu ScanSnap (about $400 street price) which includes a full version of Acrobat Standard. The Fujitsu can scan 15 double-sided pages per minute directly to PDF Image-only format! The input bin can hold 50 pages. The downside with the ScanSnap is that it is not Twain or ISIS (two standard methods that applications communicate with scanners) compliant, so it cannot be used with directly from Create PDF from Scanner in Acrobat or used with Photoshop, etc.

I also wrote an article about the Canon DR-2580C. This scanner may be used directly from Acrobat and works particularly well with Acrobat 8. The DR-2580C scans at 25 double-sided pages per minute.

Use a Scanning Service Bureau
Send out those bankers boxes of documents to a local scanning provider. They can return Image-only PDFs to OCR. If your service bureaus offer OCRd PDFs, make sure you test them first. In many cases, we’ve found that selecting OCR’d text on the PDF is iffy. Ask them what kind of image compression they use. Test the documents to see if they are tagged. Most times, you’ll get better results OCRing in Acrobat.

Acrobat Won’t OCR your file because it contains renderable text

You’ll see the Renderable Text Error when the PDF you are trying to OCR has vector elements on it like stamps, annotations or Bates Numbers. It’s a particular problem with federal court files that are image-only PDFs with stamped Bates numbers.

Solutions:

Install the Acrobat 8.1 or Higher
Acrobat 8.1 allows OCRing of documents which contain vector elements within margins defined as 20% of the width/height of the page. See my Acrobat 8.1 Article for more details. This fix accommodates almost all Bates numbered PDFs received from the courts.

Remove Headers and Footers or Bates Numbers
Go to Document—>Add Headers and Footers and remove the headers and footers and remove all entries.This solution only works if the Headers and Footers or Bates Numbers were stamped using Acrobat.

Remove the Header/Footers Manually
You can select and delete the vector elements by choosing Tools—>Advanced Editing—>Touchup Object Tool in Acrobat Professional.

Another option is to use the Redaction Tools in Acrobat 8 Professional to remove them.

Error When Using Batch OCR

You may encounter this error during Batch OCR (Acrobat Professional only) and using the PDF Optimizer:

Warning Message from PDF Optimizer

“Settings which allow retaining the document’s PDF Version, cannot be processed.”

Solution:

Set the PDF Optimizer to Version 5 or higher
Note that Image+Text PDFs are only a property of Acrobat 5 (PDF 1.4) and higher. Do not use the Retain Existing option:

7 Responses to Troubleshooting Acrobat OCR

  1. IMRAN says:

    I have found that the Adobe Acrobat 7.0 OCR function does not work as well as Abbyy FineReader 7. I would like to completely disable the OCR’ing ability in Adobe Acrobat. Is there a way?

    If it is not possible to completely disable the OCR function can either one of these be removed? 1) Create PDF from Scanner 2) Document – Recognize Text.
    — Ricks’ Reply —-
    Acrobat 7 is six plus years old, so the OCR is not state of the art. Upgrade!

  2. rick says:

    When i open acrobat 8 prof and open pdf you cannot optimizer it. what would the problem be the optimizer is not light up so you can select it.
    —- Rick’s Reply—
    You must open a file first.

  3. When I try to use the option Document > Optimize Scanned Pdf I get an error: Running Scanned Image Optimization is not possible on this document. I am trying to link to this document and the link doesn’t work after I copy to a CD. Any help would be appreciated.
    —– Rick’s Reply—-
    My guess is . . . 1) The file is locked or on a non-writable volume or is a PDF Normal Document. 2) You need to ensure that links follow the exact same directory structure within a subfolder or they will not work when moved to CD.

  4. Glenn Slayden says:

    You can get around the error “Scanned Image Optimization is not possible…” by re-printing the document to the Adobe printer, and then optimizing the resulting PDF file.

  5. Glenn Slayden says:

    To be clear, I reprinted the original PDF file from within Acrobat, to the PDF printer…

  6. Maggie says:

    When I run the optimizer on a pdf with several pages, some of the pages will straighten out but others won’t. I have tried running the optimizer on individual pages and it doesn’t help. I don’t get any error messages, it all looks like it ran properly, it just doesn’t correct the rotation of the page.

    • Rick Borstein says:

      Not sure what could be going wrong on this. There aren’t any options for deskewing. It is either on or off.