Archive for July, 2009

Blog Topic List

This week went by too quickly.  Just as I was getting caught up from last week’s email it’s time to go again. Yep, I’m on vacation again next week.  Hoping for just a little less rain this time. 

In the mean time, I thought it would be helpful to share some of the blog topics I have on the backburner. 

  1. A closer look at script objects: Explain how script objects work internally.
  2. Coding style guide: I am interested in improving the readability of JavaScript code.  What can we do in terms of object/variable/function naming and general form organization to make our forms easy to grok by others.
  3. Multiple records: Did you know you can have multiple data records in your PDF?
  4. DataDescription Deep Dive: When you base your form on an XML schema or sample data we generate a data description.  The data description used at runtime to format instance data correctly.
  5. Enhancements to debugger, lintcheck: Since I’ve put these samples out I’ve been collecting a list of enhancements and bug fixes.  Just need time to implement them.
  6. Form/Fragment Version Control: When I distribute a sample, I need a way to be able to let you know when an update is available.  I’d like to have a mechanism where the form can "phone home" to discover updates.  I bet some of you have similar issues with forms you distribute to your users.
  7. Working with rich text: rawValue, saveXML(), non-standard <span> attributes, editing rich text with E4X, controlling tab stops and page breaks dynamically
  8. Debugging Performance Issues: Got a form that draws too slowly?  It’s not always easy to discover why.
  9. Locale support: XFA has terrific support for multiple locales.  Explore how you can tap into this capability.
  10. Table Joins in your data: How do you cope when your XML data has id references to other elements and you want to "join" between those "rows"?
  11. My acrobat.com experience

This isn’t the full set of topics, but it will do for now.  What looks interesting to you?  Are there topics you’d like to see covered that are not on this list?  Let me know.

Script Validation vs. Null Validation

There is one question I get asked frequently enough that it warrants its own blog entry.

Someone tries a validation script that looks like this:

this.rawValue !== null;

But the validation never fails.  In spite of the fact that the field is null, this field always reports back that it is valid.

The reason is because of this rule: "empty/null fields never fail a script validation".  The reason behind the rule is that we want to allow designers to define validations without worrying about null as a special case.  Take an example:  the user needs to enter a value in a ‘Age’ field where the value must be greater than 18.  The way to express this is with this validation on the Age field:

this.rawValue > 18;

In an empty field this evaluates as "null > 18;" — which evaluates to false.  Until the user enters a value, this validation script will always return false.

But allowing a script validation to fail on an empty field would be a lousy user experience.  As soon as they open a blank form they would get a pile of invalid fields — even before beginning to enter data.  What should a form designer do?  They could change their script to explicitly allow null:

(this.rawValue === null) || (this.rawValue > 18);

But now there’s the problem that we really do want to make sure they enter a value.  i.e. the experience we want is that the field starts out in a valid state.  Don’t complain about script validations until they’ve actually entered a value in the field — but also don’t let them submit their data until the field has been populated with a value!

Of course, the answer is to make the field mandatory (the null test validation).  By making the Age field mandatory we know that the user will not be hassled as soon as they open the form; but they also won’t be allowed to submit until they’ve provided a value.  And the form author doesn’t have to handle a null value as a special case.

—–

Gone Again

I see Merriam-Webster has added "Staycation" to their dictionary.  Sounds good to me.  I will be away for another week at the family cottage.  I’ll be be back to respond to comments on the 27th or 28th.

Protecting Form Internals

Recently I was involved in a discussion around protecting the contents of a form.  The request was to be able to protect the form logic and parts of the form data from tools that could extract this information from the PDF.  There are various reasons behind this request, including:

  1. Some of the form data is "office use only".  The data is not bound to any form fields, but it is used in some of the form logic/calculations.  This data could be sensitive.
  2. Script embedded in the form represents intellectual property that the form author does not want to share.
  3. Having access to the form definition makes it easy to create a spoofed version of the form for a phishing attack

I have sympathy for all these reasons.   But the bottom line is that there are many different methods and tools you can use to extract the contents of a PDF.  I’ve distributed samples on this blog to do just that.  The latest example is the lint checking tool that extracts all the script from your form and highlights coding issues.

There are lots of great workflows that are enabled because the contents of the PDF are shareable.  It’s one of the strengths of the solution.  But on the other hand, we don’t always want to share everything.  And we want to offer protection to our clients.  There are some strategies to help you along:

Document Encryption

You can encrypt the contents of your form.  Users then need a digital id or password to open the file.  The contents of the PDF are hidden from anyone who does not have the necessary credentials.  Everything is hidden behind the encryption — including the form definition and the data.  However, encryption doesn’t satisfy all needs:

  • Those who have credentials have full access to the form definition and data.  It’s not possible to hide selected parts of the form contents.
  • It is not viable to require credentials to access forms distributed to the general public.

Edit Password

You can specify a password on your PDF to restrict editing of the form.  This prevents users from bringing the form into Designer and making modifications to the form.  However it does not prevent users from using various tools to extract the contents of the form and making an editable copy.  The edit password sends a message to your friendly users to ‘keep their hands off’, but it is not a deterrent to a hacker.

Document Certification

Certification is the best defence against a phishing attack.  Your form definition can always be forged.  Even if we prevent access to the form definition, the attacker can always imitate the appearance and behaviour of your form from scratch.  But while someone might be able to make a copy of your form, the attacker cannot spoof certification.  Train your users to download their forms from your website and train them to expect valid certification on these forms.  With certification they can be certain that the form they’re using has been authored by you.

Secure Submit

We worry about the safety of user data during submission.  To protect user data, make sure the connection you provide for submission is secure.  Use https for submissions and the data will be encrypted during transmission.

XML Signatures

Any discussion about secure data is not complete without also mentioning XML Signatures.  XML Signatures allow the end-user to sign the data that they submit.  Signing data has two primary benefits:

  1. Establishes the identity of the person who signed the data
  2. Confirms that the data has not been tampered en-route

Server-side Processing

Sensitive form processing can be delegated to the server.  When the logic runs on the server, the execution details can be hidden.  There are several strategies to accomplish this:

  • Mark calculations to run-at server.  When using LiveCycle Forms, this will round-trip the form data to the host to allow a calculation to run in the context of the server. 
  • Use a web service.  Send a SOAP request to the server and get data back.
  • Communicate to a server-side process directly using a http submit or using formcalc put/post/get methods.

Of course, the disadvantages of relying on the server are:

  • It means that parts of your form logic will work only when the user is online
  • Frequent server interaction may limit the scalability of the solution
  • Server-based solutions are expensive

Obfuscation

Script Logic

Some of us have a natural ability to obfuscate our JavaScript code by writing cryptic, comment-free code :-).  But there are more methodical things you can do.  For starters, you can use a JavaScript obfuscator to make your code very difficult to read. 

Script logic can be stored in byte code format — or more specifically — in an embedded SWF file.  Today this is an option only for static XFA forms that run on the client. It’s not possible for dynamic forms or for forms that need their logic to run on the server.

Data

It’s pretty hard to hide your data, especially given that Acrobat users can extract it using the menu command: Forms/Manage Form Data/Export Data…  But there are some things you can do to prevent easy access:

  • Put the sensitive parts of your data somewhere other than <xfa:datasets/>.  e.g. You can put it in a generic packet under the <xdp/> element.  To process it at runtime, load it into an E4X object.  Storing in a different location will hide it from the average user who knows how to export data.
  • Sensitive data can be disguised.  Look at how you tag your data.  If your element is called <ManagerSalary/> then the meaning is pretty clear.  However, if it’s tagged as <value37/> then more context is needed to figure out what the data holds. 
  • You can apply basic encoding on the contents of the data.  e.g. You could take a fragment of your data and store it as a base64-encoded blob.  Your form would then need to include logic to decode and process this data.  This will not fool the determined hacker.  They will figure out that the script to decode the data lives inside the form.  But it will hide the data from the casual observer.

Ultimately, obfuscation does not provide complete protection, it is merely a deterrent.  We should always assume that obfuscated contents can be reverse-engineered into something meaningful.