Legal
The views expressed in this blog are my own and do not necessarily reflect the views of Adobe Systems Incorporated.
Search
August 3, 2009
Reducing the File Size of Scanned PDFs
It seems like a lot of folks are struggling with the size of scanned PDFs. Below are excerpts from two emails I received recently:
My [Fujitsu] ScanSnap makes PDFs that are too big . . . like around 60K per page! What can I do to make these smaller in Acrobat?
I have to eFile [with the Federal Court] and am having to split the filings into many segments to go through the [Court] gateway. The issue seems to be with documents that are scanned on our network scanner. PDFs produced directly from Word are a lot smaller. Is there some trick to reduce the size of scanned files?
Before covering how to reduce the size of scanned documents in detail, let's discuss four factors that affect the size of scanned images:
- Scanning Resolution
A scan at 600 dpi results in a much larger file than at 300 dpi. - Color Space
Color and grayscale files result in much larger files than black and white files. - Physical dimensions of the scanned page
A legal-size scan will be larger than a letter-size scan, with all other factors being equal. - Compression
Raw scan data can be compressed to make it smaller.
Compression Types |
|
Lossless compression retains the exact appearance of the original. Two common types of lossless compression are ZIP and CCITT Group 4. |
Lossy compression makes some (hopefully) non-noticeable visual trade-offs to further reduce file size. JPEG is a common lossy compression method. |
Ideally, you would control all of the above factors yourself by scanning at 300 dpi, black and white and using an efficient compression algorithm.
Unfortunately, you many not have that option. Many desktop and network scanners offer limited or confusing options— or— the scanned PDFs arrived from outside your firm.
Legal Scanning Recommendations |
For the purpose of this article we will make a couple of assumptions:
- You have a black and white scanned document of unknown dpi and compression
- You have already OCR'd the document, or don't need OCR
Read on to learn how to reduce the file size of scanned documents using Acrobat.
July 12, 2009
Using the Fujitsu S510 ScanSnap with Adobe Acrobat
My sister Sue is seven years older than me and— as she occasionally will point out— seven years wiser.
Sue is a family therapist and works with a number of clients. She's been in practice for well over twenty years and consequently has a a large number of paper files. The state where she works mandates that she must keep these files for seven years.
Thus, it wasn't surprising when I received an e-mail from her asking if I could suggest ways for her to go paperless.
It immediately occurred to me that Sue's needs might not be unlike those of the typical solo attorney or small firm.
My suggestion was to use an inexpensive Fujitsu ScanSnap scanner to scan in her client files. Fortunately, I just happened to have a ScanSnap S510 sitting in my office. This would be the perfect test environment to develop a workflow and best practices for scanning in client records..
At about $400, the S510 comes with a full version of Acrobat Standard (that's worth $299 right there) and has a rated scanning speed of 20 double-sided pages per minute.

Perhaps this is not the most elegant way to describe this device, but it is sort of a beginner's scanner. Unlike more expensive devices, you cannot control the S510 directly from Acrobat or other applications because it lacks a TWAIN or ISIS driver.
The lack of TWAIN doesn't mean that this isn't a useful device. The ScanSnap S510 is a great scanner, but you do need to understand how to use it to best advantage.
| The ScanSnap S510 has since been replaced by the S1500. The Fujitsu ScanSnap S1500 Deluxe Bundle includes Acrobat 9 Standard and updated versions of the applications mentioned in this article. |
Read on to learn how to set-up and use the scanner. I've even included a downloadable PDF version of this article.
May 5, 2009
Better PDF OCR. ClearScan is smaller, looks better
Optical Character Recognition (OCR) converts scanned paper documents into searchable PDF documents. This technology has been available in Acrobat for about ten years.
While OCR accuracy and language support have improved over the years, the default OCR "flavor"— Searchable Image— was the only useful choice.
Searchable Image retains the underlying scanned image and adds an invisible layer of text on top which may be selected:
Searchable Image OCR has some shortcomings:
- File Size
For 300 dpi black and white scans, a typical file size is 15-40K per page. Scanning at higher resolutions (600 dpi Vs. 300 dpi) increases file size about three to four times. - Print Speed
Because of the image-heavy content, searchable image PDFs can take a long time to print. - Visual Quality
At 300 dpi, scanned documents are easily distinguishable in quality from computer-generated files.
In Acrobat 9, Adobe engineers added a new flavor of OCR called ClearScan. ClearScan offers improved text quality with a decrease in file size:
I've recently completed some benchmarking which shows dramatic file size decreases and quality gains. Read on to learn about size comparisons, how to use ClearScan OCR and a bit more about how it all works.
April 15, 2008
Creating a Non-Searchable PDF from Office Documents
Every once in a while, I receive an email that has me scratching my head a bit, such as this one:
When you PDF a document that you generate in MS Word, is there a way to produce an "image-only" PDF, with non-searchable text? The only way I know how is to print out and scan the document back into Acrobat.
Why would someone want to take a perfectly good, fully-searchable document and turn it into an image-only PDF which is just a picture of the page in a PDF wrapper?
The answer is that in the course of vigorously defending a client, some firms desire to make using documents as difficult as possible for the other side.
Of the various PDF flavors , an image-only PDF is . . .
- 3 to 5 times larger in file size
- Look worse on screen
- Print slower
- Not searchable
"Dumbing down" a PDF to an image probably doesn't cripple the other side very much. Using OCR, the other side can quickly make the document searchable.
It is not without some trepidition that I share this tip. After all, compact, searchable PDF should be what we all aspire to create.
However, since I suspect that many firms are printing out documents and rescanning them, I want to offer a greener alternative.
It's not for me to comment on whether this is fair game or not as you work with the other side, but following is a workaround that will create an image-only, non-searchable PDF from an existing PDF document.
December 10, 2007
Cleaning up Scanned Images
I recently received this message from a legal technology consultant:
I have had several clients (and have wondered myself) why there’s no way to delete something from a PDF. For example, if I scan a document and want to delete the black marks made by the staple holes in the top left corner, I can’t do that without cropping the entire image. What is the reasoning for not including a feature that would allow me to draw a box around those staple holes and delete them from the image?
Actually, Adobe did include a feature to clean up scanned images!
You can easily clean up scanned images using the Redaction tool:
Normally, redactions appear as a black box which obscures the underlying document. Did you know that Acrobat can redact to "No Color" as well?
In this article, I'll offer step-by-step instructions for cleaning up scanned PDFs using the Redaction tool in Acrobat 8 Professional.
Using this workflow, you can easily delete staple marks, hole punches, shadows, dirt and more from PDFs.
July 28, 2007
Rick's Scanning Article on LLRX.com
LLRX.com is a great destination for legal professionals.
This independent, web journal for legal, library, and marketing professionals is— amazingly— a one-woman operation!
Sabrina Pacifici is the brains behind this site which receives more than 130,000 unique readers each month.
When Sabrina asked me to contribute an article that would appeal to a large number of legal pros, I immediately thought about scanning and OCR.
Almost everyone has to find a way to get paper documents into their computer.
Read on for a link to the article and a brief summary.
June 13, 2007
Acrobat 8.1 Update: Fix for Renderable Text Issue
Normally, a dot release to one of Adobe’s major product offerings isn’t that exciting.
Besides offering support for Microsoft Windows Vista and Microsoft Office 2007, the latest dot release to Acrobat 8 (v8.1) offers an OCR enhancement that will be very welcome indeed!
Acrobat 8.1 offers a fix to a most vexing OCR problem— the dreaded renderable text error:

Renderable Text is vector (computer generated) text that is placed on top of an image layer.
You may encounter this error if when you try to OCR an image-only PDF containing a Bates stamp. In some federal court districts, stamped image-only PDFs are commonly distributed.
The Acrobat 8.1 Update offers a fix that works for just about every file that has Bates stamps.
For a complete list of fixes in the 8.1 Update, check out this Adobe Knowledge Base Article. (Opens in a new window)
Read on to learn how to get the Acrobat 8.1 Update and some limitations of the fix.
June 12, 2007
Troubleshooting Acrobat OCR
Searchable PDFs are critical in litigation and matter management. Using Acrobat's OCR function, you can turn mountains of paper into searachable PDFs that look just like the original.
Occasionally, you may run into some issues.
Read on to learn about some workarounds and key considerations.
February 19, 2007
Is that PDF Searchable?
Most law firms and even solos have a scanner that can create PDF from paper documents. Overwhelmingly, these devices create image-only, non-searchable PDFs.
Using Optical Character Recognition (OCR), Acrobat can add an invisible layer of searchable text while maintaining the original appearance.
The resulting searchable file is referred to as an image+text PDF.
An image+text PDF looks no different than a PDF which is not searchable. That creates a problem.
How can you tell if a PDF is searchable or not?
October 27, 2005
Batch OCR using Acrobat Professional
Have you ever received a PDF file that did not contain searchable text? You may know that you can use Acrobat’s OCR (Optical Character Recognition) to add an invisible layer of searchable text on top of the file. This allows you to select, copy and search text on a paper document. Great!
What do you do when you have hundreds of TIFFs and Image-only PDFs file that you need to search for a big case? Working with these documents one at a time is not efficient.
If you have Acrobat Professional, you can batch OCR and let you computer do the work for you.
Read on to learn how…


