InDesign Conference, Miami Beach: Recosoft

I’m attending the combined InDesign, Acrobat and Vector conferences this week in Miami Beach. Today I spoke with Paramjit Chadha, Managing Director of Recosoft, the company that developed the very cool PDF2ID product.

  What It Is

PDF2ID is an InDesign plug-in that enables users to extract content from a PDF file and put it in an InDesign document that matches the PDF as closely as possible. PDF2ID is a valuable utility to have in your tool chest because of its ability to extract the content of a PDF file and make educated guesses about the original source document’s layout–and this is a non-trivial task. PDF files don’t contain all the layout information of an InDesign document.

PDF is a compact file format that doesn’t contain data constructs like threaded text frames, for example. What PDF2ID is doing is analyzing the content of a PDF and then based on the nature of the objects and their relationships, constructs a layout from those objects in a new InDesign file. Their website describes it this way:

PDF2ID extracts text to form lines, then groups the lines to make paragraphs; applies text fonts and styles such as bold and underline where possible; regroups independent graphic elements; and creates tables. Furthermore, it performs contextual analytics so that related data are correlated and remain together.

Paramjit likes comparing a PDF file to a fully cooked beef stew. PDF2ID’s purpose is to try to extract all the stew’s ingredients in their original, uncooked state…so that the cook can start over again and repurpose them.

What It Is Not

PDF2ID is not a round tripping tool that you would expect to recreate the layout of a PDF file exactly just to make minor edits and then export it back to PDF. Neither is it a PDF file editor. Their website puts it this way:

Rather, the primary scope and objective of PDF2ID is to provide a seamless and transparent mechanism for PDF data recovery and reuse within InDesign. To achieve this, PDF2ID does its best to preserve the layout while reconstructing data along with the respective property and elements wherever possible.

 

PDF2ID is also not a hacker’s tool. If a PDF file has embedded security to prevent the unauthorized extraction of data, PDF2ID will require a user to enter that password in order to perform the extraction process.

What It’s Good For

PDF2ID is an invaluable tool to have at your disposal when you’ve got to redesign or repurpose PDF content for which you don’t have the original source files. Or suppose you’ve got some old FreeHand files that you’d like to migrate to InDesign…you can convert them to PDF and then use PDF2ID to extract and format the content as closely as possible to the original source in a new, fully editable InDesign file.

Typical PDF2ID customers are designers, freelancers, IT departments and service providers who need the ability to do this kind of extraction for their customer or colleagues.

I’ve used PDF2ID many times, and I’ve been impressed with how well it does what it’s intended to do. If your expecations are realisitc, you’ll be pleased as well.

I asked Paramjit about the task of doing data extraction from a PDF. Not all PDF’s are created equal. There are so many different PDF generating applications available today, that being able to parse and interpret them correctly takes a lot of study and experimentation.

PDF2ID can also be limited by the PDF version, and if the PDF is flattened. If you’re creating your PDF’s using PDF 1.3, or if you’re distilling PostScript, then you’ve got a flattened PDF. This can be an issue if you’re trying to extract content from a document that contains transparency. Because neither PDF 1.3, nor PostScript support transparent objects, a design with transparency will be flattened to conform to the limitations of those file types. This can mean that text or vector objects are turned to raster (image) datat, or text might be converted to outlines. When PDF2ID finds these types of flattened objects, it can only extract the raster data in the PDF and drop it into your InDesign file.

For this reason, unflattened PDF’s (1.4 and higher) directly exported from InDesign and Illustrator are likely to give you far better extraction results than flattened PDF if those files contain transparency effects. Be aware that if you’re creating PDF via PostScript, you’re creating a flattened PDF.

If this kind of content extraction sounds like it might be useful to you, check out Recosoft’s website and give PDF2ID a look. You can download trial versions of their software here.