Null Data Handling

We run into scenarios where we want to control how null values are represented in our XML instance data.  There are three different ways that null data can be represented in XML data:

  1. Exclude the element.  If the value has no data, do not write it out to the XML file
  2. Use XML Schema’s nil attribute:
    <spouseName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                                                                                                  xsi:nil="true"/>
  3. Write out an empty element.  e.g. <spouseName/>

Null Handling in XML Schema

When determining which strategy to use, we look to the form’s XML Schema for hints:

  1. If a leaf element (no child elements, no attributes) is marked as optional (minOccurs="0") then it will be excluded when null.
  2. If an element in the schema is marked as nillable="true", then the data will be marked with the xsi:nil attribute
  3. In all other cases we write out null values as empty elements

For the rest of this blog entry I’ll describe exactly how the XFA processor deals with null data, and offer some tips on how you can further customize the behaviour.  In particular, I’d like to show how to control null handling at the data group level.  

Let’s look at the problem using a specific example.  Here’s a sample purchase order schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="purchaseOrder" type="poType"/> <xsd:complexType name="poType">
<xsd:sequence>

  <xsd:element name="emptyItem" minOccurs="0">
   <xsd:complexType>
    <xsd:sequence>
     <xsd:element name="partNum" type="xsd:string"/>
     <xsd:element name="description" type="xsd:string"/>
     <xsd:element name="quantity" type="xsd:positiveInteger"/>
     <xsd:element name="unitPrice" type="xsd:float"/>
    </xsd:sequence>
   </xsd:complexType>
  </xsd:element>

  <xsd:element name="excludeItem" minOccurs="0">
   <xsd:complexType>
    <xsd:sequence>
     <xsd:element name="partNum" type="xsd:string"/>
     <xsd:element name="description" type="xsd:string"/>
     <xsd:element name="quantity" type="xsd:positiveInteger"/>
     <xsd:element name="unitPrice" type="xsd:float"/>
    </xsd:sequence>
   </xsd:complexType>
  </xsd:element>

  <xsd:element name="xsiItem" minOccurs="0">
   <xsd:complexType>
    <xsd:sequence>
     <xsd:element name="partNum" type="xsd:string"/>
     <xsd:element name="description" type="xsd:string"/>
     <xsd:element name="quantity" type="xsd:positiveInteger"/>
     <xsd:element name="unitPrice" type="xsd:float"/>
    </xsd:sequence>
   </xsd:complexType>
  </xsd:element>

  <xsd:element name="comment1" type="xsd:string" minOccurs="0"/>
  <xsd:element name="comment2" type="xsd:string" nillable="true"/>

</xsd:sequence>
</xsd:complexType>
</xsd:schema>

 

Note the two comment elements at the bottom.  When comment1 is saved to data, it will be excluded when null.  When comment2 is saved to data it will be annotated with the xsi:nil attribute.

But what some form authors want is for empty groups (purchase order items in our example) to be excluded when their contents are null.  I can show you how — but we are now getting into the deep end.

Data Description

When we save a PDF/XDP file that is based on a schema or sample XML or WSDL connection, we generate a data description.  This data description is really a distilled version of the schema.  It takes the form of a sample XML annotated with special namespaced attributes.  The data description for the sample above looks like:

<xfa:datasets
      xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/">
<dd:dataDescription
      xmlns:dd="http://ns.adobe.com/data-description/"
      dd:name="purchaseOrder">
  <purchaseOrder>

   <emptyItem dd:minOccur="0">
    <partNum/>
    <description/>
    <quantity/>
    <unitPrice/>
   </emptyItem>

   <excludeItem dd:minOccur="0">
    <partNum/>
    <description/>
    <quantity/>
    <unitPrice/>
   </excludeItem>

   <xsiItem dd:minOccur="0">
    <partNum/>
    <description/>
    <quantity/>
    <unitPrice/>
   </xsiItem>

   <comment1 dd:minOccur="0" dd:nullType="exclude"/>
   <comment2 dd:nullType="xsi"/>

  </purchaseOrder>
</dd:dataDescription>
</xfa:datasets>

 

Note how the comment elements are annotated.  We use dd:nullType to specify the null handling behaviour:

  1. dd:nullType="exclude" don’t write out elements where the value is null.  Note that this option may be used only if the element is marked in the schema as optional (minOccur="0"). 
  2. dd:nullType="xsi" Use the XML schema ni
    l attribute.  As described in the W3C definition:

    "XML Schema: Structures introduces a mechanism for signaling that an element should be accepted as valid when it has no content despite a content type which does not require or even necessarily allow empty content. An element may be valid without content if it has the attribute xsi:nil with the value true. An element so labeled must be empty, but can carry attributes if permitted by the corresponding complex type."

  3. dd:nullType="empty" (the default) save null values as empty elements

The dd:nullType attribute can also be placed on grouping elements.  When it is placed on a grouping element, then the setting gets applied to all children of the group.  While there is nothing in XML schema that will do this for us, we can do it by hand-editing the item elements in XML source view:

<emptyItem dd:minOccur="0" dd:nullType="empty">

<excludeItem dd:minOccur="0" dd:nullType="exclude">

<xsiItem dd:minOccur="0" dd:nullType="xsi">

And here is the obligatory sample form.  And the sample schema.  The sample form is bound to the sample schema, and has a field that shows the current state of the XML instance data.  Try typing values into the 3 different partnum fields and the comment fields, and watch how the data reacts according to the instructions in the data description.

The Really Deep End

The disadvantage to hand-editing your data description is that when your schema changes and the data description gets refreshed, your edits will be lost.

There is a way that you can write script to update your data description automatically.  The general technique is described at this blog post.  The script below will update all your data descriptions and for every element that is marked minOccur="0", we’ll add nullType="exclude".  Note that the changes will be applied when saving as PDF or when generating a PDF using LiveCycle.

purchaseOrder::initialize - (JavaScript, server)
if (xfa.host.name === "XFAPresentationAgent") {
    var vDataDescriptionList = xfa.datasets.dataDescription.all;
    for (var i = 0; i < vDataDescriptionList.length; i++) {
        var vDD = vDataDescriptionList.item(i);
        var sDD = vDD.saveXML();
        sDD = sDD.replace(
             /dd:minOccur= "0" (?!dd:nullType="exclude")/g,
             "dd:minOccur=\"0\" dd:nullType=\"exclude\" ");
        xfa.datasets.nodes.remove(vDD);
        xfa.datasets.loadXML(sDD, false, false);
    }
}

9 Responses to Null Data Handling

  1. Bruce says:

    Hi John, Is there any control available over the way empty xml attribute nodes are handled. If I bind a field to an attribute that attribute seems to be always in the resulting xml, even if it has a null value. I’ve noticed there is a dd:reqAttrs attribute in the data description but even though my optional attribute is not in this list it is still submitted in the resulting xml with an empty string, even if that violates the allowed values defined in my schema. Thanks Bruce

  2. Kristof says:

    Bruce,

    Is there a way to manually (by script) create an XML element that has the xsi:nil attribute set to true?? I’ve also posted this question on the forum (http://forums.adobe.com/thread/776717?tstart=0), but haven’t got any answers yet. Thanks in advance.

    Kristof

  3. Kristof says:

    I meant John instead of Bruce :)

  4. John Brinkman says:

    Kristof:
    Offhand, I can’t point to an easy way to directly manipulate a data node to add the xsi:nil attribute. I suppose you could do it with loadXML(). But that’s fairly heavy handed for this case.
    More importantly, I’m wondering why you want to do this. After all, if the data description is correctly annotated with dd:nullType=”xsi”, then simply assigning the field value to null will cause the data value to have the xsi attribute set.

    John

  5. Claude Warren says:

    We have a case where data was entered into a field, form saved, form reloaded, data removed from the field, form saved. Once this occurs it seems that the field will be in the exported XML even if the dd:minOccur=”0″ and dd:nullType=”exclude” attributes are set. Is there any way to declare that zero length values are the same as null so that those elements will be removed.

    We have the following:

    <ns1:other_device3 dd:minOccur=”0″ dd:nullType=”exclude”>
    <ns1:device_name dd:minOccur=”0″ dd:nullType=”exclude”/>
    <ns1:description dd:minOccur=”0″ dd:nullType=”exclude”>
    <ns1:project_only dd:minOccur=”0″ dd:nullType=”exclude”/>
    </ns1:description>
    <ns1:number_of_devices dd:minOccur=”0″ dd:nullType=”exclude”>
    <ns1:project_only dd:minOccur=”0″ dd:nullType=”exclude”/>
    </ns1:number_of_devices>
    </ns1:other_device3>

    and the XML being generated is

    <ns1:other_device3>
    <ns1:number_of_devices>
    <ns1:project_only/>
    </ns1:number_of_devices>
    </ns1:other_device3>

    Any help would be appreciated

  6. John Brinkman says:

    Claude:
    Your schema definition looks ok. Can’t say why it’s not working for you. I’ll ping you offline.

    John

  7. Hi John,

    I found a little mistake in the regular expression of the replace() function in the provided script.
    The provided script ignores dd:minOccur=”0″ string if it’s dirrectly followed by the XML closing bracket.
    example:

    I corrected this behavior by simply changing:
    /dd:minOccur= “0” (?!dd:nullType=”exclude”)/g,
    with:
    /dd:minOccur=”0″(?! dd:nullType=”exclude”)/g,

    Best,
    Juraj