This is Part 2 of the previous blog.

I think PDF forms represents a very powerful and significant tool. Increasingly we want both humans and computers to read documents. However, the requirements for easy reading for each is considerably different.

For a long time the primary use of computers has been "data processing". Business data processing has worked with structured data consisting primarily of numbers and text strings whose meaning and properties are well defined and known a priori. Much of the data semantics is build into the data processing software.  In the last decade or two, distribution and sharing of information among humans has moved in to share the primary spot.

Humans, when given numbers and strings, also need a context in which to understand their meaning and significance.

When creating a PDF form, a designer makes a very explicit decision of which information is needed by humans and which is needed by our data processing software. People do document processing and for the most part computers do data processing.

Those places where fill-able blanks occur in a form define the data that is being collected or displayed.  The background or "artwork" of the form turns the raw data into a document that provides the context in which the human can understand the data.

Here is a diagram showing how an artwork presentation layer and a data layer come together to make a document from which both the human and the computer can obtain exactly what they need.


PDF forms maintains this separation of layers and the data layer can be imported or exported into and out of the form artwork layer. The humans see the composed document and the computer can process the data only, with the traditional data processing software. Either the computer or the human can supply the contents for the data layer for presenting or gathering the data.

So, I think forms offer a very clever way for computers and humans to both see that part of a document most suitable and necessary for them to process.

Jim King (mailto:jking@adobe.com)

 

Sound bites (or is it bytes)

I think most technical people share a problem that I have: we have extreme difficulty in expressing ourselves in one simple sentence.  I have this problem when responding to questions/issues about PDF. For example, I have a hard time responding to this inaccurate statement in a short sound bite:

"PDF is great because it is not editable and freezes the content."

Technically that statement is totally inaccurate but there are related statements that are true.  For example:

"PDF is great because it not only captures my content but allows me to chose and lock down the look and feel for my content."   or

"PDF is great because I can apply a document signature to the file after I create it and then people can detect if it has been tampered with between me and them."

And here is one that I encountered from my financial advisor: "I had always sent my customers paper spreadsheets in the mail because I didn't want them to have my spreadsheet electronic files that have my intellectual content as far as the calculations and macros. Once I could make PDF files from my spreadsheets I can send them electronically and not worry."  Am I to conclude that his primary value to me was in his spreadsheets?

Editablility and Resuse

But to get at the issue of re-use and editability versus frozen content, I have to use quite a few sentences, in fact, the remainder of this blog and the following blog.

The first issue we have to get straight is whether something is a function of the PDF file format or of the software that processes it. If people are concerned about the PDF file format then they need to join the ISO committee that is now managing PDF as the ISO 32000 standard. Many of my previous blogs record the process of moving the ownership of PDF from Adobe to ISO which was completed in January 2008.

But most of the reuse issues are a property of the software not the PDF file format. So if someone doesn't like the behavior of their current software they might consider looking for other software and/or convincing someone to provide software with the needed function. 

But just for example, there are degrees of resuse that have been incorporated into Adobe's Acrobat viewer including the following:

•  Copy/Paste.  If the author permits it, I can copy content from a PDF and paste it into other files. Adobe has spent a great deal of time and effort to make this work as well as it does, especially given the complexity of dealing with text.  Please see my previous blog entry about text in PDF.

• Export. Acrobat supports exporting PDF content into various formats including .rtf, .doc, .html, .eps, .png, ,jpeg, .xml, .jpeg2000, .tiff, .xls, .ps, .txt.  I was almost alarmed when I opened Acrobat to obtain an accurate list and found so many format supported.  And there are choices and setting for many of these. I assure you that this represents a great investment by Adobe to provide this support for reuse of PDF content. Many of these export functions are imperfect but do provide a strong basic ability to reuse content.

•  Hybrid Files. One can make a "hybrid" PDF document that includes the author's original source file as an attachment. This is supported as an automated feature by Open Office tools as well as the Acrobat tools that create PDF files form Microsoft Office products. This provides a final form PDF document with the editability of the original source that the author used to create the PDF in the first place.

•  Forms. A more sophisticated kind of hybrid file is supported by PDF fill-in forms. This is so cool that I am going to make the discussion about it a separate Part 2 to this entry. (I wonder if the reason I think this is so cool is because I defined the properties for the Acrobat forms prototype in 1993. Na!  It is just cool!)

If an author wants to inhibit the reuse of the content in their documents they can set properties within the PDF file to prohibit it. For some authors the content of their documents represent their intellectual property and they want to protect it.

So, if things don't work to our liking it may be an authors decision or the software designers decision, but seldom should we hold this as a PDF deficiency. PDF is a cool tool.

Jim King (mailto:jking@adobe.com)

 

 


I have written a paper attempting to describe how Adobe managed the evolution of the PDF file format for over 15 years before turning its management over to ISO. I have offered this to the ISO committee so that they may benefit from Adobe's experience, but we are still trying to figure out the appropriate ISO publishing mechanism to make it available from them. In the meantime it will be available here.

Adobe tried very hard to allow new PDF files with new features to be viewable in older viewers.  Likewise, we tried to design our viewers so that when they were the "old viewer" they would do the best that they could to display a file even if it had new features.  And perhaps most important of all, we almost never removed any feature from PDF so that any file could be viewed by today's viewers even if the file was made 16 years ago.

This paper was derived from an internal Adobe technical note written by me and a task force of employees who studied the whole issue of versions and compatibility in 2006. I have acknowledged those other employees at the end of the paper. Thanks to them.

That paper is, of course, a PDF file and is attached here: Compatibility_090819 (56K)

Enjoy!  (Oh, and I almost feel like I should apologize for not making more regular posts to this blog. I have just been very distracted with several things. Hopefully I can find more time in the future.)

Jim King  <mailto:jking@adobe.com>

According to the European Union definitions, PDF (ISO 32000-1) supports Advanced Electronic Signatures (AES) and Qualified Electronic Signatures (QES). There has been some confusion on this point and I will explain this more in this blog.

Background

The European Union (EU) is going through a very interesting and exciting time of trying to bring together many hitherto independent countries into, well, a European Union. There is particular interest in government-to-constituent electronic communications and in conducting business transactions across the EU electronically. So the EU is pushing the electronic envelope in many ways that by the nature of their activity is very "standards" oriented. This is in contrast to, say the US where various government agencies and various business are approaching these things less from a standards view and more from a view of what works and what is available. At least these are my observations, for what they are worth. This is not to say that the US is not interested in standards but just that the EU seems to be nearly consumed by a standards consciousness.

Electronic signatures (e-signatures) play an important role in assuring trustworthy and legally sound communication between governments and those governed and for businesses conducting electronically assisted transactions with other businesses and with customers also on a sound and legal basis.

The Directive 1999/93/EC of the European Parliament and of the Council, dated December 13, 1999, about a Community Framework for Electronic Signatures, is commonly referenced as the test(s) that electronic signature technology has to pass to be used in various legally binding manners within the EU. In fact, all EU countries have agreed to accept "Qualified Electronic Signatures" (QES) on a par with plain old ink-on-paper signatures and other electronic signatures including "Advanced Electronic Signatures" (AES) "cannot be denied legal effectiveness or admissibility as evidence".

PDF digital signatures can be AES and QES

One reason I bring all this up is that PDF digital signatures can be QES according to the EU definitions, provided that the certificates used are "Qualified Certificates" (QC). PDF digital signatures can also be AES.

A lot of this has to do with establishing a hierarchy of trusted "Certification Service Providers" (CSPs). In practice these are Internet servers that deliver certificates that establish an association between people and their public keys.  I want to make the most important point that since PDF digital signatures are based upon the same PKI (Public Key Infrastructure) standards that these CSPs use then PDF can provide AES and QES. PDF is very suitable for conducting business in the EU and now with PDF an international standard (ISO 32000-1) the EU owns PDF just as much as anyone else. Certainly, Adobe no longer owns PDF.

Meeting requirements

Here is a key quote from the 1999/93/EC Directive (Article 2.2):

"advanced electronic signature" means an electronic signature
which meets the following requirements:

(a) it is uniquely linked to the signatory;
(b) it is capable of identifying the signatory;
(c) it is created using means that the signatory can maintain
under his sole control; and
(d) it is linked to the data to which it relates in such a
manner that any subsequent change of the data is
detectable;

All of these properties can be satisfied by a PDF digital signature. The standardized PKI Certificates can satisfy the (a) through (c) and (d) is satisfied by typical PKI signing technology using message digests. More background about PKI and PDF signatures is provided in my previous blogs. The QES add to these requirements additional ones about the quality of the certificates used and the CSPs and is spelled out in Annex I and II of the Directives. PDF can use these qualified certificates and hence can support QES also.

Watch the wording

PDF digital signatures have many optional choices and exactly which ones are used for any given signature depends upon the software used and in some cases on the signers choices. For example, which signer certificate and who issued it are the signers choice. I have tried to word my claims carefully by saying "PDF digital signatures can be" QES and AES since it is possible to chose options that will not satisfy the EU requirements.

ETSI/ESI and Electronic Signatures

European Telecommunications Standards Institute (ETSI) is recognized by the European Commission as a European Standards Organization. Its Electronic Signatures and Infrastructure Technical Committee (ESI) has established standards as its title suggests, in particular CAdES (CMS Advanced Electronic Signatures:TS 101 733 ) and XAdES (XML Advanced Electronic Signatures:TS 101 903 ). These standards were carefully crafted to follow the European Commission Directives and have become relatively synonymous with those directives.

PDF digital signatures and CAdES share the same infrastructure. They both use the Cryptographic Message Syntax (CMS) including particularly PKCS#7. PDF also allows the use of PKCS#1 and other schemes so this is a point where we must say that PDF supports PKCS#7 but not exclusively. We note that the European Directive is not so specific that it spells out the use of these technologies but CAdES and PDF have made these implementation choices.

There are some very particular differences in the exact way that PDF uses PKCS#7 and the way that CAdES uses it. Since some people associate CAdES synonymously with the European Directive they conclude that these differences make PDF not comply with the directives. The fact is that CAdES and the European Directive are not the same thing and although CAdES is an outstanding standard that follows the directives it is not, nor will it be, the only technology that follows the directives. The difference between PDF digital signature and those of CAdES are very minor and the European Directives does not give enough technical detail to distinguish between them.

ETSI/ESI and ISO 32000-1

ETSI has recently established a Task Force (TF) within ESI to establish standards common between CAdES, XAdES and PDF digital signatures as specified in ISO 32000-1. This TF is in the process of making sure that these technologies come together to everyone's satisfaction and they will make special efforts to make sure there is no doubt that they follow the European Directives as they evolve. In particular, they will spell out and standardize which choices for PDF (ISO 32000-1), assure that the signatures are AES or QES. They also plan to work with the ISO working group on any changes for the future digital signature technology in PDF (anticipated as ISO 32000-2).

This is good news for both ETSI/ESI and the ISO PDF working group.  It is especially good for users who want to use standard digital signatures.

I think I have one more blog article in me about digital signatures, so stay tuned for a few more details about PDF digital signatures and how they work.

Jim King (contact: jking@adobe.com)

 

Happy New Year

| No Comments

Welcome to 2009.  I hope you took some time off during the "holiday" period and refreshed your relationships with friends and family. It is great to have one time of year where this is the objective. I had a very relaxing time away from work and have come back refreshed and surprisingly quite a few pounds lighter as my wife put me on a diet in early November and it is working. I did spend good times with my children and now my two grandchildren (15 month old twins, one boy and one girl).

If you have topics related to PDF that you would like me to publish blog articles about, just send me a note.

Again, happy new year and to best to you in this upcoming, probably financially difficult, year.

Jim King  (contact: jking@adobe.com)

Digital Signatures: PDF

| 1 Comment

In the previous article we gave a quick introduction to PKI technology (Public Key Infrastructure) because that is what is used for PDF and most other digital signatures. Now we are going to talk more specifically how that technology is used for digital signature in PDF.

If I receive a digitally signed PDF file what should I expect?  First I would like to know that the file is unchanged since it was signed. A central notion is that there are encryption techniques, essentially all of them, where if the encrypted file is changed it cannot be successfully decrypted. The decryption produces some garbled garbage instead of meaningful results. So the person who is digitally signing the document encrypts it with their private key. Then if I can successfully decrypt it, I know it has been unchanged since they encrypted it.

So we need to know that the file is unchanged, but also since when and who says so. At least, in theory, I know exactly who that signer is because the certificate that has her public key was created by an authority (CA) that I can trust or one that is trusted by one that I can trust or ... .  Her signature is also time stamped so I know when the file was signed. So the "since when" and "who says so" is handled by the use of PKI certificates and the encryption that takes place when the PDF file is signed. At the time of the signing the document that is to remain unchanged is established.

An interesting side note: there is no way to stop someone from changing a file, but we do have the means to detect if they did.

Message digests (cryptographic hash functions)

Encrypting or decrypting a large file is computationally intensive and can take a noticeable amount of time even on today's powerful computers. So in cases where we would use encryption as a technique, not to hide the contents, but to assure that a file has not changed, another method has been developed based upon what are called "message digests" (MDs) or "hashes".

What if we could just number all documents or files that were ever made. Then I could just tell you the document number and you would know what document I meant. Very crudely, this is the idea of a hash or MD. If we truly numbered all documents the size of the numbers would be huge, so nothing would be gained even if we could do this.

The idea of a hash or MD is that the number of real documents is extremely tiny compared to the number of all possible documents so people have developed algorithms that will derive a small sized number for any given document and the algorithms are mathematically justified to 1) rarely, if ever, produce the same number for two different documents 2) not be invertible, (nearly impossible to generate the document from the number) and 3) make it nearly impossible to make a second document, different from the first that has the same number. These phrases have rarely's and nearly's because this work is based upon statistically rare events not happening or extremely complicated and time consuming computations that are, today, impractical.

OK, so what is the deal here. It is faster to compute an MD over the document than to encrypt the document, and the resulting digest is limited to say, 32 bytes of data so compared to the actual document, it is very small. I send you both the document and an MD I computed over its bytes. Then you compute the MD again on your copy of the file, using the exact same algorithm that I used and you compare your MD to the one I provided. If they match, you are nearly assured that your copy is the same as mine. Again, computing the MD is faster that encryption/decryption, and besides that, the document can remain unencrypted and hence readable without any special computation. Something useful if I am not so concerned about the chance that it might have been altered but just want to view it.

But there is one glitch in what I have said so far. What if someone intercepted the document and its MD in transit and changed the document, recomputed the correct MD to match the changed document and forwarded it to you. It would check out OK.  So in addition to what was said above, we encrypt the MD with the signer's private key so that it cannot be altered without detection. We use the same document signing technology discussed earlier but we only apply it to the computed MD and not to the document. We let the MD serve as a document surrogate or "digest" for these purposes.  As long as we trust and believe the inventors of these hashing algorithms, the chances of making a mistake are so rare they will not happen to us in our lifetime (or some strong statement like that).

So to summarize what happens when a PDF document is signed: 1) the signing software computes an MD using the bytes of the document file and encrypts it with the signer's private key and 2) the signer's public key is made available in a signed certificate (package), the certificate having been issued and signed by a CA which may be authenticated by a chain of CAs via their certificates. 

When the signature for a document is checked: 1) the MD of this copy of the document is computed, 2) the signer's public key is obtained from the certificate and if needed the certificate is decrypted and examined to make sure the identity of the signer is as expected, 3) the encrypted MD sent with the file is decrypted using the signer's public key from step 2 and compared with the MD computed in step 1.  If all this checks out then the document is an identical copy to the one signed and the signer is who they say they are.

One last glitch for the basic story. A PDF document is a single file and we want to keep that property.  But we have the MD which is a hash over the whole PDF file which would itself then contain the MD. That is a circular problem that is impossible to solve directly: computing something that is based on the results of the computation. So the complete PDF file with all the signature information is saved onto the disk but with a reserved "hole" where the package containing the MD will be placed. Then the MD is computed over all the bytes on the disk, except for the hole. Then the MD package is written into the hole. Of course, when the signature is checked, the hole is again ignored when the MD is computed.  A little funky but it works, and we can have a signed PDF document that maintains its property of being a single self-contained file.

And, of course, all of this is documented in the public ISO 32000-1 standard.

Next time I will go over some of the things specific to the European Union with respect to electronic and digital signatures.

Jim King (contact: jking@adobe.com)


Digital Signatures: basics

| No Comments
This article is on the basics of digital signature technology. To start with we need to get up to speed on what is called PKI technology (Public Key Infrastructure) because that is what is used for PDF and most other digital signatures.

Some simple basics

The PKI is based, in large part, on the clever asymmetric public key cryptography. Keys are relatively small strings of data that are needed to encrypt and decrypt much larger strings of data. For us, those larger strings of data are PDF document files. The clever invention is to provide a user with two keys that are unique to that individual, one being a "private" key that only that person has and the other a "public" key that is openly given to others. The term "asymmetric" derives from the notion that if one key of a pair is used to encrypt something then the other key of that pair is what must be used to decrypt it. If the same key is used to both encrypt and decrypt then it is a "symmetric" key.

So for example, if I use the private key, of my asymmetric pair, to encrypt a PDF file then I can send that file and my public key to anyone and they will be able to decrypt it.  Why would I send a file and the means to decrypt it together? The answer is, if the file can be successfully decrypted with my public key then, with statistically high confidence, you know that I was the one, and the only one, that could have encrypted it because I am the only one in possession of my private key. Furthermore it cannot have been changed since I did the encryption, or the decryption would have failed. Keeping private keys private is, of course, fundamental to this whole business.

In the reverse direction, if you want to send me something that only I can open, you can encrypt it with my public key and given that my paired private key is only known to me, I am the only one that can decrypt it. Neat, huh!

So, if it were a perfect world, we would all have our pairs of keys, with our private key only held and known by us and our public key available and known to everyone. But two things complicate this picture. How do I associate you, the person with your public key, a string of bytes on my computer?  I could get them from you directly but I still have to worry about them being tampered with while in my possession (within my computer). And if you are a stranger to me, how can I associate your public key with you?   So now enter the "Infrastructure" part of Pubic Key Infrastructure.

What the industry has proposed and implemented is a system of automated notaries that run on (Internet) servers, called "Certificate Authorities" (CAs), or in the EU "Certification Service Providers" (CSPs).  A means to protect the public keys has also been invented using cryptographic documents called "certificates". This is the "infrastructure" part of PKI. The CAs "sign" the certificate documents containing my public key with their private keys just as I sign documents using my private key. The process is, of necessity, hierarchical since I can then ask how I know for sure that I am using the proper public key for that particular CA to check its signing. The answer is by using a certificate created by a higher authority CA that securely supplies me the lower level CAs public key. This stops at some root where I check the proper public key by some other, perhaps manual, means.

The second complication is that I might want to have more than one identity for different roles that I may need to play so I can have multiple key pairs and hence multiple certificates. There is also a spectrum of how strongly the CAs checks out who you are before giving you a certificate so I may have more than one for that reason.

This stuff is all in place and operational but user education seems to be a major obstacle. Each person has to obtain one or more certificates from an authority at a suitable level and establish trusted root authorities for checking other's certificates. I also think that there are way to many options and choices which all fall to the users, who are mostly naive in this technology.

In a subsequent blog I will cover how all this works with PDF files.

Jim King (contact: jking@adobe.com)

 

Recent Comments

  • Kendall Whitehouse: Jim: Very interesting post. This brought to mind my own read more
  • Suchmaschine: Very good information for xml. read more
  • Mike: Hi Jim - these articles are very informative, thanks Digisig read more
  • Ahmet ISIK: Hi Jim, Thank you for your informative blog post. Can read more
  • John Dowdell: Just checking... is the takeaway here "Signing a PDF is read more
  • Ryan Kirk: Thank you -- great description in layman's terms. read more
  • zoobab: Is there a list of patents owned by Adobe covering read more
  • hugh: Glad to see you've picked this back up. I'm glad read more
  • Mr Mario Kapalka: Hi I really found this very informative. I am very read more
  • hugh: this blog had so much potential and you let it read more

Recent Assets

Find recent content on the main index or look in the archives to find all content.