Paragraph Breaks in Plain Text Fields

The vast majority of text fields we create on our forms hold plain text. Downstream systems that receive data from our forms handle plain text much more easily than they deal with rich text expressed in XHTML.

Obviously this is a bit of a compromise, since plain text is much less expressive than rich text. However, one area where we can express some richness in our plain text is by handling paragraph breaks — specifically by differentiating them from line breaks. This means that paragraph properties on your field such as space before/after and indents can be applied to paragraphs within your plain text. The main difficulty is how to differentiate a paragraph break from a line break in plain text and what keystrokes are used to enter the two kinds of breaks.

Keystrokes

Most authoring systems have keystrokes to differentiate a line break from a paragraph break. The prevalent convention is that pressing “return” adds a paragraph break, and pressing “shift return” adds a line break. However that convention seems to be enforced only when the text storage format is a rich format. E.g. it works this way in Microsoft Word, but it doesn’t work this way in notepad. Similarly in our forms. When entering boilerplate text in Designer or when entering data in a rich text field we follow this keystroke convention. Entering “return” generates a <p/> element, and entering “shift return” generates a <br/> element. However, when entering data in a plain text field there is no difference between return and shift-return. Both keystrokes generate a linefeed — which is interpreted as a line break.

Characters

You might assume that in plain text we could simply use the linefeed (U+000A) and the carriage return (U+000D) to differentiate between a line break and a paragraph break. However, it is not so easy. We store our data in XML, and the Unicode standard for XML does not support differentiating these characters. XML line end processing dictates that conforming parsers must convert each U+000A, U+000D sequence to U+000A, and also instances of U+000D not preceded by U+000A to U+000A.

As of Reader 9, we have a solution by using Unicode characters U+2028 (line break) and U+2029 (paragraph break). When these characters are found in our data, they will correctly generate the desired line/paragraph breaking behaviours.

The problem now is one of generating these characters from keystrokes. We can’t just change the default behaviour of Reader to start inserting a U+2029 character from a carriage return. Legacy systems would be surprised to find Unicode characters outside the 8-bit range in their plain text.

However, the form author can explicitly add this behaviour. The trick is to add a simple change event script to your multi-line plain text field:

testDataEntry.#subform[0].plainTextField[0]::change – (JavaScript)

// Modify carriage returns so that they insert Unicode characters
if (xfa.event.change == ‘\u000A’)
{
    if (xfa.event.shift)
        xfa.event.change = ‘\u2028′;  // line break
    else
        xfa.event.change = ‘\u2029′;  // paragraph break
}

As you can see in the sample form, entering text into these fields will now generate the desired paragraph breaks in your plain text.