“What exactly are the OCR improvements in Acrobat X?”
That question came in recently from one of our field people. As a marketing guy, I just take engineering’s word for everything [not really]. If they tell me that the OCR is improved, I say “ok” and move on [again, not really - this is just a rhetorical conceit]. So I asked, they told.
To begin with, there’s a new OCR engine in there and it’s operating at about 1% more accurate than the previous one. Now, I know that one percent doesn’t sound like much but when your already close to 100%, that extra one percent is a big deal. But there are also some process and usability improvements as well.
In Acrobat 9, there was a two step process; import the image and then run the OCR. Because we’d automatically optimize the image for file size, there may have been a small amount of data loss. Optically, this isn’t a big deal but for OCR it can mean the difference between an 8 and a B.
In Acrobat X, we import the image, run the OCR against the lossless version, and then optimize the image for storage in the PDF file. This allows for a smaller file size and we are OCRing the highest quality image. This is true if you connect directly to a scanner or if you drag and drop an image onto Acrobat.
I hope that gives you some insight into the OCR improvements.