Posts in Category "Search and Index"

Speed up PDF Search with an Embedded Index

I recently received this query from a customer:

I have assembled over 4000 pages of case data into a single PDF. When I choose Search (CTRL-F) and search for a keyword, it can take a while time to find a word. Is there any way to speed up the search?

Heck yeah!  Acrobat Pro allows you to embed a full-text index in a document which greatly accelerates search. The index travels with the document (it’s embedded, duh!). An embedded index speeds up search ten to twenty times.

In this article, I’ll show you how to embed an index in a PDF. You can literally do this in a minute or two!

Note: Acrobat Pro can also create a cross-document index. I’ve written about this before.

Embedding an Index in a PDF

  1. Open the PDF in which you want to embed the index. If the PDF is a scanned document, you should OCR it first.
  2. Open the Tools Pane and click on the Document Processing section:00_doc_processing_section

    NOTE: If you don’t see a Document Processing section, click the flyout menu to make the section visible:01_flyout

  3. In the Document Processing section, choose Manage Embedded Index 

    02_manage_embedded_index

  4. The Build window opens. Click the Embed Index button.03_embed_button

     

  5. Depending on the size of your document, building the index may take a few seconds to a minute or two. Generally, Acrobat indexes very fast.04_index_finished

With that simple change, even the largest PDFs can be searched super-fast.

If you add to your PDF over time, simply update the Embedded Index following the steps above.

Two Kinds of Search

Acrobat offers two variants of search.

FIND allows you to find the next or previous instance of search term. You can get to Find by typing CTRL/CMD-F:
06_find_window

ADVANCED SEARCH returns a search results lists which includes a snippet of the text in context. This is one of the best ways to quickly spot a search term. Advanced Search also includes a number of advanced search features such boolean operators (AND, NOT, OR) and many other remarkable features.

How to use Advanced Search in Acrobat

The best way to get the benefit of faster search with an embedded index is to use Acrobat Advanced Search option.

To get to Advanced Search, choose Edit> Advanced Search or type CTRL-ALT-F on Windows or CMD-OPT-F on the Mac.

In the Advanced Search window, simply type in the word or phrase you are looking for and hit the Search button.

Acrobat will return a contextual hit list of words. Below, I searched for the term “preflight” and found 254 instances in the document.

05_advanced_search

 

 

New in Acrobat X: Saved Search Results

One frequent request I’ve received over the six years I’ve had this blog is:

Can I save a report of Search Results with Acrobat?

In the past, I’ve always had to sheepishly say "no", but not any longer!

With Acrobat X, you can save search results to either an interactive PDF or to a spreadsheet file.

Creating a Search Report for a Single Document

  1. Open the document you wish to search
  2. Choose Edit—>Search

    —or type—

    Windows: Control-Shift-F
    Macintosh: Command-Shift-F

    Acrobat will split your screen between the Search window and the Document window.

  3. Type your search term in to the window. Here, I typed in "california"
  4. Click the Search button.
  5. The Search Results window will display.
    Click the disk icon to save your search as either a PDF or CSV file

Creating a Search Report for Multiple Documents

  1. Place the files you wish to search in a single directory.
  2. Choose Edit—>Search

    —or type—

    Windows: Control-Shift-F
    Macintosh: Command-Shift-F

    Acrobat will split your screen between the Search window and the Document window.

  3. In the Search window . . .
    A) Choose "All PDF Documents in"
    B) Choose the Browse for Location option and navigate to the folder you wish to search
    Setting up multi-document Search in Acrobat X
  4. Type in the word you wish to search for
  5. Click the Search button.
  6. The Search Results window will display.
    Click the disk icon to save your search as either a PDF or CSV file

Looking at the PDF Report

If you choose the PDF format to save the report, Acrobat will create a hyperlinked report document you can use to analyze the search results.

If you searched across multiple documents, Acrobat will provide a bookmark to each document report.

The Search Report document contains an extract for each instance found of your search term and a link to the source document.

Looking at the CSV File

CSV (Comma Separated Value) files may be opened in a spreadsheet program such as Microsoft Excel.

If you have installed Microsoft Office (Mac or Windows), you should be able to double-click on a CSV file to open it in Excel.

Once in Excel, you can manipulate the columns or cut and paste cells any way you want.

Continue reading…

How can I detect if a PDF needs to be OCRd?

You just received 1000 PDFs from the other side which are a mix of PDFs created from Office applications and scans. Some of the documents might have been OCRd and some not.

How can you quickly detect which files need to be OCRd?

Further, how can you pull out and separate searchable and non-searchable PDFs?

I have written on this subject previously in my article “Is that PDF Searchable?” That post included information on how to test if individual documents are searchable and offered a basic way to detect searchability across files.

Why detecting searchability is hard?
When would you call a PDF searchable? When one word is searchable? When 100 words are searchable? When a page is searchable? When all the pages are searchble? What about pictures or text inside of pictures?

I’ve been doing some research and in this article I offer up another way to check for searchable text.

To accomplish this, we will use the Preflight feature of Acrobat Pro. Acrobat’s Preflight feature offers hundreds of different tests including the ability to check for characters on the page. Preflight can be used on a single document or it can be automated using a batch sequence.

The following workflow isn’t perfect, but I offer it here to legal professionals who want to experiment with it.

In this article, you’ll learn how to create a Batch Sequence to run across folders of files which will:

  1. Separate searchable PDFs from non-searchable PDFs and place them in named folders
  2. Ignore non-PDF documents
  3. Create a Summary Report of searchability

Continue reading…

Searching and Marking Multiple Words in a PDF

Legal Professionals often need to search across a large number of documents. Finding a key fact, name or term is an important part of how you will apply your knowledge to a case.

For example, recently a paralegal sent me this email:

An attorney I work with just gave me a list of about 50 words and phrases as part of a case. I need to mark these terms each time I find them in my case documents. Help! Is there a way I can list all of the search words in a PDF?

While many folks have discovered the Search functionality in Acrobat, Acrobat 9 and below do not offer the ability to save searches or report the results.

Oddly, the only tool in Acrobat that allows you to search for terms and mark them in a PDF is part of the Search and Redact feature. This will add a mark to the page around the search term.

Redaction highlights on a document

I wrote about using this technique in my previous article Highlighting Multiple Words in a PDF Document.

In Acrobat 9 Pro, it is possible to highlight multiple search terms using this same technique and you can do so “jiffy quick”.

But, Acrobat redactions permanently remove information!
That’s true, once you apply them. However, in this use case, we are only going to mark the words using the redaction tool, not apply them which actually removes the information.

So . . . no worries!

I’ve also included a link to Joel Geraci’s Redact to Highlight and Back, a free script for Acrobat that can convert redaction markups to standard Acrobat annotations.

In this article I’ll show you how to:

  1. Input a series of search terms and have Acrobat automatically mark them
  2. Create a new PDF which summarizes all of the words where found

Continue reading…

Managing, Annotating and Searching PDF Packages

In my last article Search and Combine using PDF Packages, I discussed how to search a large number of documents and combine the resulting documents into a PDF package.

The result was a PDF package containing a target list of documents for further investigation.

With this “hot” set of documents in hand, it is time to carefully review them. You want to find out:

  • Who is mentioned in the documents
  • The issue(s) associated with the documents
  • When actions took place

Once you have all of this information, what do you think about what you found? How will you make your case?

In this article, you’ll learn how to:

  1. Add Notes or Annotations to a document in the package
  2. Add or delete documents in the package
  3. Search within a package, including your annotations

Read on to see how Acrobat can be used as a case analysis tool in this second article of the series.

Continue reading…

Search and Combine using PDF Packages

Attorneys take large amounts of information and winnow it down to get to the documents that matter.

What’s the best way to do that with Acrobat?

I received this email today from someone who stopped by the Adobe booth at LegalTech West:

I [ use Acrobat to ]OCR legal docs and then do a search of them to come up with a smaller target of documents, i.e search Dr. Smith and all docs with his name in it come up in the search. I would then like to (A) print just those docs and (B) create a new PDF of just those docs, but I cannot figure out how to do it. Is it possible?

I had to think about this one… Acrobat can’t do it automatically.

Read on to learn about a workaround that might work for you.

Continue reading…

Full Text Search of PDF using Adobe Acrobat

Lately, everyone’s been asking me to help them find themselves…

After a talk at the Missouri Solo and Small Firm conference, I chatted with a solo real estate attorney who asked for my advice on developing a searchable article archive from the materials he had collected over the years. “How do I find the articles I need?” he asked.

I also talked to a lawyer who took on a probono criminal defense case. “How can I find where my client is mentioned in all the police records I was sent?” she asked.

And, at the LegalTech West show, a workman’s compensation investigator asked how to search medical records. “How can I apply notes to these handwritten medical records and find them later?” he asked.

In this article, I’ll discuss how to use Acrobat Professional to create a full-text index so you can find what you need… fast!

Read on to learn more…

Continue reading…

Is that PDF Searchable?

Most law firms and even solos have a scanner that can create PDF from paper documents. Overwhelmingly, these devices create image-only, non-searchable PDFs.

Using Optical Character Recognition (OCR), Acrobat can add an invisible layer of searchable text while maintaining the original appearance.

The resulting searchable file is referred to as an image+text PDF.

An image+text PDF looks no different than a PDF which is not searchable. That creates a problem.

How can you tell if a PDF is searchable or not?

Continue reading…