Scrubbing Metadata – Practice and Policy
(Or, I never metadata I didn’t like)
Please forgive any missing images or edits from our blog archives.
Recently, I had a pleasant and informative exchange with one of the legal business’s technology gurus and bloggers, Sharon Nelson, president of Sensei Enterprises. Sharon has been sharing her legal and tech know-how with the industry for a while via her excellent blog, Ride the Lightning.
Our dialog centered on scrubbing metadata from PDF documents. Sharon cited recent encouraging survey results showing that metadata scrubbing software is growing in availability among the firms surveyed (59% this year, up from 46% in 2009). This is good news. With more scrubbing software in hand, more attorneys can adopt metadata scrubbing as standard procedure. Many of you are using Adobe Acrobat to create and to scrub PDFs, but not all of you may know the best ways to do that. So I thought it would be OK to reiterate some of the points I made in Sharon’s blog.
Creating a PDF using Adobe Acrobat does not necessarily scrub that document of all metadata. That’s because the PDF is meant to be a faithful representation of the original document. So if your original document contained text, graphics and layout, you probably want the PDF to contain that same text, graphics and layout. And if your original document contained information like “Document Title” or “Document Keywords,” you probably want the PDF to contain that information (that metadata), too.
But in cases where you do not want the PDF to contain the original document information, I suggest you look at these two features of Adobe Acrobat:
If you are creating a PDF with Acrobat, Acrobat offers various ways in its creation tools to not retain the document information or metadata. For example, if you are converting from Word to PDF on Windows using the Acrobat PDFMaker functionality, there is a checkbox called “Convert Document Information” which when unchecked, will not retain information like Title or Author when the Word document is converted to PDF.
Furthermore, regardless of how your PDF was created, Acrobat 8 and higher have a powerful tool called Examine Document. This will scan through the PDF, identify any potential hidden information – metadata, comments, bookmarks, file attachments and even “hidden text” (text hidden by another object or white text on white background) – and allow you to remove it with a single click. This tool can give you the confidence that you can check what information may be in that document prior to publishing.
It’s also easy to set up a reminder to scrub a document’s metadata when closing Acrobat or when sending a file using Acrobat. Simply go to Edit > Preferences > Document and this will prompt you to remove the metadata. So Acrobat has the capability to show you “pop up” messages if you want to be reminded to scrub your PDF.
What’s more, larger firms can make Acrobat metadata scrubbing automatic. Have your firm’s IT person configure Acrobat prior to installing it on your computer so that “Convert Document Information” is off by default when you create a PDF.
Information about Examine Document can be found in the Acrobat Help File here or you can read this good overview article here, which also includes a link to an excellent eSeminar on “Redaction and Metadata Removal.”
All that said, another of Sharon’s readers, Judith Maier, made this point: “In this era of e-discovery, if the organization’s intent is to always remove all metadata before ‘using’ the document, for example, sharing it with a client or another organization, that policy should be expressed in the organization’s written data retention policies and procedures… Perhaps it takes a moment or two longer to use the process that Adobe outlined, but in the end, I think it affords a better degree of security for an organization.”
To which I say: Exactly.
Dave Stromfeld, Senior Product Manager, Acrobat