Archive for November, 2011

Security misconceptions – Watermarks, Usage Rights and Rights Management

There is a confusion about what features of Acrobat and PDFs in general offer by way of securing documents. I would like to do a very cursory overview of the items that I have so far seen users consider “security.”

To be clear, by “security” I mean the ability or inability to access the contents of the PDF, thus safeguarding information from entering the wrong hands.

1) Not Security-Oriented

a) Watermarks

Unlike on your Dollar, Euro or Pound notes (etc), the watermark is NOT a guarantee of integrity, veracity or anything at all.

In the PDF world, a visible watermark only exists as a notification mechanism. If a watermark says “Confidential,” it is only warning the viewer that the content is confidential, but will not otherwise try to make itself indelible.

It is meant to be a very visible mark on the page, with the added property of not completely obfuscation the items underneath (allowing readability to be maintained)

b) Certification

A Certified PDF carries a digital signature certifying that certain things can and cannot be done with it. Namely:

-A PDF certified to run privileged scripts can run scripts requiring special privileges, such as writing to the hard drive.
-A PDF certified to be unmodified means that so long as the PDF has been modified withing given parameters (fields filled in for example), then the certification will hold. If a visual aspect of the PDF changes though, the certification will be broken, and Acrobat will report an error.

Certification covers a number of other use cases as well, but I hope the above illustrates sufficiently why this is a not a security-related item, rather a usability concern.

c) Reader Extensions Usage Rights

Acrobat and LiveCycle can extend the usability of PDFs to Adobe Reader, the free PDF viewing application. By extending usability features, you can allow Reader users to fill in forms and save that content, add comment annotation, and other functionality.

However, if the same extended form is opened in Acrobat, the user can do to the PDF pretty much anything that Acrobat has at its disposition.

REUR adds functionality to Reader. Any extra functionality it does not add is a restriction that Reader already had.

2) Security-Oriented

a) Password Protection

Using password protection, you can encrypt the PDF so it can only be opened by a person who has the password. You can also prevent the PDF from being used in certain ways, such as modifying the pages.

You cannot however track who has opened the PDF, when and at what IP. That is the domain of Rights Management.

b) LiveCycle Rights Management (aka Policy Server)

LiveCycle 7 introduced Policy Server, later renamed to LiveCycle Rights Management. Adobe LiveCycle/ADEP Rights Management protects your documents from being accessed by parties you have not authorized to do so.

This allows the document publisher to:
-protect with a user ID/password combination
-force the identification to go to a remote server
-restrict usage rights depending on the user’s group

With this is mind, you must be aware that ONLY persons that are trusted should be granted a login to the document. If, on a document that you want to protect, you have granted access to a person you do not Trust Entirely, you have opened the door to having your information stolen – be it via sreen grab, or simply photographing the screen with a camera.

It’s like having the best vault to protect your secrets and giving the secretary the passcode for safekeep. If the secretary is honest, they will leave your items well alone. But if you did not trust them in the first place, the vault, for all its technology and mechanisms, cannot protect your secrets – because you’ve willingly given the key to the intruder.

3) A note on Rights Management and SSL

To use Adobe LiveCycle Rights Management, you need to setup the server to be able to server SSL connections, and configure the callback URL appropriately in the LiveCycle/ADEP Rights Management service configuration.

Note that if the server’s SSL certificate specifies external CRLs, you must be able to grant the client application free network access to the CRL’s URL – otherwise the connection will fail.

I hope that this article has allowed you to understand the subtle difference between the perceived security tools and actual security features – and most importantly, the fact that if you suspect a user may likely try to do Bad Things with your information, you should not give them the keys to the vault.

My own Rule Number One of security is: “don’t trust anyone, not even those you trust.” Then add exceptions, based on well-founded assumptions.

— Tai

Flat PDF vs. XFA Form PDFs

A frequent mistake that is made is to assume that, since XFA Forms can be saved as PDFs, they will behave like any other PDF. Truth is: XFA PDFs and flat PDFs are entirely different beasts.

1) About PDF

“PDF” stands for Portable Document Format. Initially, PDFs were meant to be a digital counterpart to printable documents. You can open a PDF, and see the layout exactly as the page designer intended, with pictures and page breaks in the right places, ergonomic page margins, and most noticeably, with the original fancy fonts preserved. This is the original flat PDF.

Flat PDFs contain the page render data – a binary encoding of how the document should visually be drawn on screen or on paper (minus interactive items such as videos and flash animations).

PDF has come a long way since. It really has embraced the idea of “portable document,” the idea of the distribution of a published, polished document, and seeks to be all that printed documents could never be.

You can embed videos, flash animations and 3D spaces, protect them with encryption, limit their usage with DRM solutions (Adobe’s own is called LiveCycle Rights Management), annotate through comments and highlighting, measure elements, digitally sign them, make form fields to be filled in – and make forms that change according to the data inside them. Wow.

2) About forms – AcroForms

The first iteration of interactive form filling came as AcroForms.

At the most basic, an AcroForm is a flat PDF form that has some additional elements – the interactive fields – layered above the flat render, that allow users to enter information, and allow developers to extract data from.

You can create these using Adobe Acrobat, or any third-party PDF creation application that allows creation of PDF forms.

Flat PDFs can be annotated (comments, highlighting, and various other scribbles as desired), as these annotations can be mapped to an {x,y} location on the page.

Flat PDFs can have their pages extracted, as each page is already defined in the render.

Flat PDFs can be linearly optimized, for fast web viewing, which ensure that data for the first page all occurs before the data for the second page, in turn being before the data of the third page, and so on.

Such features are not available to XFA PDFs.

3) About forms – Dynamic XFA Forms

Dynamic forms are based on an XML specification known as XFA, the “XML Forms Architecture”. The information about the form (as far as PDF is concerned) is very vague – it specifies that fields exist, with properties, and JavaScript events, but does not specify any rendering. This allows them to redraw as much as necessary, with subforms repeating on the page, sections appearing and disappearing as appropriate, the ability to pull in form fragments stored in different files, and objects re-arranging as you (the developer) dictate.

This also means that some features of AcroForms and flat PDFs are lost.

XFA Foms cannot be annotated. Reader (or Acrobat) cannot know whether all your custom code may change the layout. As such, without any render data, and a chance that the render data may be drastically altered on the fly, local annotations cannot be implemented. An annotation only has sense at an {x,y} location, but if the item you are annotate changes location, your annotation becomes meaningless, if not misleading.

XFA Forms cannot have their pages extracted. There is no render data to determine pages. Change some data in the form, and the layout of the pages may change drastically. You must “flatten” the PDF before extracting pages, thereby losing interactive properties.

XFA Forms cannot be optimized for fast web viewing. There is no render information. Data at the end of the document may affect displays on the first page of the document.

4) Acrobat is not a word processor

A final common misconception is that you can edit pages in Acrobat as you could in Word. This is not the case. Acrobat focuses on the integrity of the layout – as such, if you have a page with 2 paragraphs, each 5 lines long, that is what Acrobat’s PDF engine will commit to showing.

The editing tools available are provided for minor cosmetic changes – nothing more. It is essential to keep the original documents – Word, PowerPoint, OpenOffice, TIFF or otherwise – for your editing process. Once converted to flat PDF, the intent is that the overall layout should never change.

For the same reason, XFA Forms are in conflict with Acrobat’s editing tools – the latter operate on the render layer, which does not exist in the saved XFA PDF. Such editing tools lose meaning faced with XFA Forms.

I hope this helps clear some of the confusions around what XFA PDFs can and cannot do in Acrobat and other PDF manipulation tools.

Understanding the LiveCycle GDS – and freeing up disk space

LiveCycle, as an piece of Enterprise software, tends to assume that you may want to keep a quantity of data around for posterity. Long-lived processes can cause a lot of disk space bloat, and whilst this is fine for those who wish to archive lots, this may not be ideal when running a lower-spec server.

In this article, I will point out the main areas where data and disk space use can happen, and how to clean up.

1) About Short-Lived and Long-Lived processes

Processes (also known as “workflows”, or “orchestrations”) are created in LiveCycle Workbench. This tool that allows you to create workflows, or processes, organized into Applications; and each process can be either long-lived (“asynchronous”) or short-lived (“synchronous”).

When a short-lived process is invoked, the response is only returned once the whole process has run. For this reason, no short-lived process can have a step which requires human interactions – namely, a Workspace task.

When a long-lived process is invoked, the request returns immediately. The process will run, but you will need to get the result through a different request or action. Long-lived processes do not need to have a human-centric activity in them: you could use a long-lived process to send a document to the server for procesing, without needing to know what status it ended up in.

Note that for any process that stalls, the documents associated will also be kept, ready for recuperation, analysis and debugging.

2) About the Global Document Store

The Global Document Store, also known as “the GDS” is a space on the hard drive or in the database (depending on your configuration in the Core System Configuration) where LiveCycle stores documents during the running of processes, and once long-lived processes are complete.

Note that whilst the GDS stores the files themselves, the references to them that processes need are stored in the database. For this reason, the GDS and the database must NEVER be out of sync. Should that happen, any processes that are running would fail, making data recuperation difficult or even insurmountable.

In short-lived processes, when documents are larger than a certain size, they will be written to the GDS instead of being held in memory. This size is set in the Admin UI as the Document Max Inline Size. When a result document is produced, no mater what its size, it will be written to the GDS. Short lived processes can return the document itself, or a URL to the document. Accessing this URL will cause LiveCycle to lookup the document in the GDS to write it back to the client.

Documents from short-lived processes are removed after their time is passed. The Sweep setting (set in the Admin UI in the Core System Configuration) determines how frequently the GDS is scanned for documents to delete, and its associated Document Disposal Timeout determines how long the document should be kept for. If during a sweep of the GDS any new document is found from a short-lived process, it is marked for expiry by placing a similarly named document in the GDS, with a timestamp indicating the clock time after which the document should be deleted – this clock time is determined by the disposal timeout. Every sweep checks the timestamp, and if the clock time is after the one specified in the timeout, it will be deleted. The URL returned from short-lived processes need these documents for an amount of time, between the time the URL is returned to the user, and the time the user clicks the URL. It is good to set the Document Disposal Timeout to a value between 30s to 120s, depending on the load expected on the server.

Long-lived processes will write required documents to the GDS before assigning them to a human-centric task so that they can be obtained later when the user actually logs on to process them. At the end of the process, the final collaterals are kept in the GDS for posterity and later review if required.

Thus, for long-lived processes, the files are never disposed of. The default behaviour for the GDS then is to constantly grow, if long-lived processes are used. If you do not want this to happen, you must perform regular purges.

3) Purging Jobs

In LiveCycle ES, a command-line purge tool is provided to purge jobs that either completed or were terminated. This exists still in ES2, should you ever need it.

In LiveCycle ES2, the Health Monitor was introduced to offer a graphical UI for performing purges.

In ES2 SP2, a purge scheduler was introduced to automate, at intervals, the purge of jobs.

a) If you are on ES2 SP2

Connect to Admin UI and go to Health Monitor > Job Purge Scheduler

Schedule a One Time Purge for records older than 1 day

b) If you are using ES2 pre-SP2

Connect to Admin UI and go to Health Monitor > Work Manager. Search with the following criteria:

-Category = Job Manager
-Status = Terminated
-Create Time 2 Weeks
-(iterate over time periods)

Delete any terminated processes that are found.

c) If you are on LiveCycle ES

The purge tool requires some knowledge of the contents of the LiveCycle database; for this reason I will not cover this in this article.

You can find most of the required information in the link below, however you would be best advised to operate under the guidance of the Enterprise Support service, if you can.

http://www.adobe.com/content/dam/Adobe/en/devnet/livecycle/pdfs/purging_processes_jobs.pdf

4) A note on process recordings

I would like to add a special note here concerning process recordings. These can be activated via Workbench by right-clicking on a process or on a process canvas, and selecting Process Recordings > Start Recording

This will record the activity of every time the process is launched, including the contents of LiveCycle variables, branches followed, etc, at EVERY step of the process, for later review in Workbench.

Even processes not started in Workbench will be recorded.

For this reason, process recordings must be activated ONLY for debugging purposes.

Process recordings are heavy, and are not suitable for a production server, both in terms of performance and space used. They can easily be deleted via Workbench through the playback dialog.