Text Mining: removing “Nuggets” of ambiguity from FrameMaker content

Thursday, May 10 2012 @ 11:33 PM, By Maxwell Hoffmann

On May 10th, we had an extremely informative webinar with guest John Smart, “Text Mining from Adobe FrameMaker: How to find lost Terminology in seconds.”  (Note: free Adobe account registration required to view this or any other on-demand webinar.)

This webinar focused on how one of Smart Software’s tools (Text Miner)can be used to “mine” terminology from FrameMaker documents, which can make both language translation and reading comprehension be much more effective.

Goal, finding the right 1,500 words

English has over 1,000,000 words and grows daily. Unfortunately, most of use approximately 1,500 words in speech or writing on a daily basis. Frequently, many of these 1,500 words are not good for global communications, as many words in our common vocabularies have no equivalent in other languages.

John Smart’s webinar included the “cloud” image of words below. Believe it or not, most of these words (in various contexts) do not translate well (or at all) into most languages:

One word that stands out in the cloud is “using”, a gerund. English is one of the only languages that used “-ing” words, and these are to be avoided.

What text mining can find

The following are major text components that text mining can identify in your existing FrameMaker content:

  • Abbreviations in their short and long forms
  • Special industry terminology and jargon
  • Frequency of terminology for training
  • Lists of terms for translation with content

Usage of this tool and technology can make if possible for you to identify your terminology in days vs. years. (The on-demand recorded webinar makes the reasons for this much more evident.)

Sample results of text mining from FrameMaker content

As the webinar made clear, existing terminology is extracted as key words and displayed with “left text context” and “right text context.” The screen shot below shows a typical example:

One example of ambiguous terminology that could use a replacement is “excessive FORCE.” This begs the question, “what amount of pressure is excessive?”  Although the meaning of the original text may seem obvious to an engineer or a seasoned groups of native English speakers who have read several versions of previous documentation, this type of terminology would be extremely difficult to translate with accuracy.

In addition, the true meaning in English is widely open to interpretation, which could have legal ramifications for a failure with aircraft or similar hardware.

The essential human factor in text minining

As John Smart made clear, although  Text Miner does an admirable job of identifying most of your terminology “automatically,” carefully qualified staff members are required to ensure that desired results are achieved.

You will need to designate a “Head Text Miner” who has the following qualifications:

  • Subject Matter Expert
  • Good command of English (strong English as second language is acceptable)
  • Experienced in a role as an agent of change
  • Understands common nuances in English
  • Ideally, staff member is Multilingual
  • A “global thinker” would be a perfect match to this role

The goal: one word, one meaning

A huge number of historic events and trends led to English becoming one of the most expressive languages on the planet. By its nature, English often offers a dozens ways (or words) to “say the same thing.” Of course, different words or phrases have different nuances and connotations.

The goal in effective technical communications for a global audience is whenever possible have one primary word with one meaning. The illustration shows “Simplified Spanish” that resulted from a clean terminology based after data mining. Because there was one English word for “electrical,” simplified choices in Spanish were the results.

Review and Refine terminology

The webinar gave sensible guidelines for creating a review team to ensure that correct terminology is saved or rejected. Recommended steps include:

  • Consult your subject matter experts
  • Form a small committee
  • Ask for legal advice (e.g. consult your corporate legal team to determine what terms or words have led to litigation in the past)
  • Visit your help desk staff to determine which words cause confusion
  • Check your source t4exts
  • Use the “hit” counter
  • Look at instrument marks and lables
  • Write a text text

Using less text with the right words

Several of our blogs in the past have touched on Simplified English and other tools that can be used to “reshape content for the small screen” to achieve better content for mobile devices. Text mining to refine your terminology can be equally essential to ensure that your message is crystal clear, in all languages, and to avoid legal issues due to missing cautions or warnings.

 

  • Categories

  • Archives

  • Authors

  • Useful Links

  • Recent Comments

    • Dieter Gust: Very nice several goodies in addition to bugfixes!! I like that very much! Dieter
    • Don Bridges: I expect privacy concerns will tamper IoT for our homes and consumers, but for business it will usher a...
    • Lois Patterson: I’m always happy to see math support, although I’m not currently using FrameMaker. I have...
    • Olga: I know it’s a really old post but I was unable to find any information online. I need to number the...
    • click: yes. http://wwwimages.adobe.com/www .adobe.com/content/dam/Adobe/e n/volume-licensing/pdfs/avl...