Full Text Search of PDF using Adobe Acrobat

Lately, everyone’s been asking me to help them find themselves…

After a talk at the Missouri Solo and Small Firm conference, I chatted with a solo real estate attorney who asked for my advice on developing a searchable article archive from the materials he had collected over the years. “How do I find the articles I need?” he asked.

I also talked to a lawyer who took on a probono criminal defense case. “How can I find where my client is mentioned in all the police records I was sent?” she asked.

And, at the LegalTech West show, a workman’s compensation investigator asked how to search medical records. “How can I apply notes to these handwritten medical records and find them later?” he asked.

In this article, I’ll discuss how to use Acrobat Professional to create a full-text index so you can find what you need… fast!

Read on to learn more…

Searching Beyond Text of the Document

Acrobat can find text in the following parts of a PDF:

  • Text of the document (regular or OCR)
  • Title, Subject, Author, Keyword (metadata)
  • Notes and Annotations
  • Bookmarks
  • PDF Attachments

So, what does this mean for legal professionals?

  1. You can find words or phrases across multiple documents quickly to help you find key facts, names, places, etc. that are contained within the text of documents.
  2. You can capture your thinking about a document—in the PDF—while reviewing it using bookmarks and comment tools.
  3. You can later find documents by the notes and knowledge you’ve applied to them.

That’s powerful.

Setting up for Search

Step 1: Make sure your documents are text searchable by Acrobat

  • Use Acrobat Optical Character Recognition (OCR) if you have paper documents or image-only PDFs in your document collection.
  • Convert electronic files such as word processing, spreadsheets, etc. to PDF

Step 2: Locate and Segregate Documents

Depending on the type of project you have, you may wish to move similar documents to individual directories.

For example, let’s say you have accumulated several years of legal research on trusts. You may wish to segregate the documents by state or issue.

If you are indexing client files, you may wish to index by client or perhaps even by matter.

Granularity Illustration

There’s no right or wrong way to organize your documents, but you do need to strike a balance between how much time you spend organizing your files and how easy it is to find what you need.

Create an Index

Follow these steps to create a full-text search index using Acrobat 8 Professional:

  1. In Acrobat X, open the Tools pane, then open the Document Processing section and choose Full Text Index with Catalog
    In Acrobat 9, choose Advanced —>Document Processing —>Full Text Index with CatalogCatalog Window
    Click the New Index button
  2. The Build Index window will appear:
    Catalog Building Window1) Give the index a name
    2) Enter a description of the index
    3) Choose the directory that will be indexed. All sub-directories will be indexed.
    4) Click the Build button
  3. Acrobat will create a .pdx (index) file at the top level of the directory you specified.
    Saving the Index fileClick the Save button.
  4. The Index Progress window will appear:
    Indexing Progress Window Note that Acrobat will skip any documents which are secured with an Open password.

Attaching to the Index and Searching

Follow these steps to attach to the index you created:

  1. Choose Edit—>Search—or type—

    Windows: Control-Shift-F
    Macintosh: Command-Shift-F

    Acrobat will split your screen between the Search window and the Document window.

  2. In the Search window on the left, click on Advanced Search at the bottom:Clicking for the Advanced Search options
  3. In the Advanced Search panel, click on the Look In menu and choose Select Index.Attaching to the index
  4. The Index Selection window will appear.
    Click the Add button
    Attach to the Index
    Locate the the index file (.pdx) that you created earlier. Normally, Acrobat will automatically find it for you.
  5. Click OK

Searching the Index

Once you select an index, Acrobat will keep it selected so you can search against it.

In the example below, we are searching for Donald, the first name of one of the parties to the case:
Using the Advanced Search options

Click the Search button.

Acrobat will present a list of documents that match the search criteria:

Search Results List

Interpreting and Using Results

Remember that Acrobat can search not only the text of documents, but comments and bookmarks, too.

Click the + sign to view the hits on each document.

Acrobat gives you useful visual indicators in the search window:

Acrobat will select and perform a different action depending on what you double-click:

When you click on a . . . This is what happens
Text of Document Word is highlighted in the document
Bookmark The bookmark panel is opened and the bookmark is highlighted
Comment The comment is selected and opened. E.g. a text note opens

Search Tip

You can save a step searching by changing Acrobat’s defaults to always use the Advanced Search options. Go to Edit—> Preferences—> Search and click the checkbox for Always use Advanced Search Options.

Final Thoughts

In this article, you learned how to create an index to search across multiple PDF documents.

Acrobat 8 offers new indexing capabilities by allowing you to embed a full-text index in:

  • A single PDF document
  • PDF package

This gives you the ability to have an index that travels with a PDF wherever you send it.

For example, you could create a trial notebook package complete with index and give it to your colleague to take to court or mediation.

 

3 Responses to Full Text Search of PDF using Adobe Acrobat

  1. Sue Labriola says:

    Received a CD of closing documents with an index created using this method. We are attempting to save it to FileSite. However, after we do so, the bookmarks do not work. T

    Have attempted to create a PDF portfolio with the individual files only and it still doesn’t work properly. Somehow they are still linked to the Index file.

    Looking for instructions as to how to “unbuild” the index so that we can create a PDF portfolio that isn’t linked to the PDX.

    —– Rick’s Reply —-
    Links are relative, so all the files must be in the same relative path. That is why when you move them to Filesite they don’t work. PDF Portfolios don’t operate like a file system, so unfortunately you cannot drag a bunch of linked files into a Portfolio and retain the path. There are Acrobat plug-ins that can validate and relink PDFs. You might try AutoBookmark from http://www.evermap.com or ISI Toolbox from http://www.isitoolbox.com.

  2. Hey there just wanted to give you a brief heads up and
    let you know a few of the pictures aren’t loading correctly. I’m not sure why but I think its a linking issue.
    I’ve tried it in two different browsers and both show the same results.

    • Rick Borstein says:

      Some of my older posts from many years ago are missing pix. I correct them as I find them.