Text Mining: removing “Nuggets” of ambiguity from FrameMaker contentThursday, May 10 2012 @ 11:33 PM, By Maxwell Hoffmann
On May 10th, we had an extremely informative webinar with guest John Smart, “Text Mining from Adobe FrameMaker: How to find lost Terminology in seconds.” (Note: free Adobe account registration required to view this or any other on-demand webinar.)
This webinar focused on how one of Smart Software’s tools (Text Miner)can be used to “mine” terminology from FrameMaker documents, which can make both language translation and reading comprehension be much more effective.
Goal, finding the right 1,500 words
English has over 1,000,000 words and grows daily. Unfortunately, most of use approximately 1,500 words in speech or writing on a daily basis. Frequently, many of these 1,500 words are not good for global communications, as many words in our common vocabularies have no equivalent in other languages.
John Smart’s webinar included the “cloud” image of words below. Believe it or not, most of these words (in various contexts) do not translate well (or at all) into most languages:
One word that stands out in the cloud is “using”, a gerund. English is one of the only languages that used “-ing” words, and these are to be avoided.
What text mining can find
The following are major text components that text mining can identify in your existing FrameMaker content:
- Abbreviations in their short and long forms
- Special industry terminology and jargon
- Frequency of terminology for training
- Lists of terms for translation with content
Usage of this tool and technology can make if possible for you to identify your terminology in days vs. years. (The on-demand recorded webinar makes the reasons for this much more evident.)
Sample results of text mining from FrameMaker content
As the webinar made clear, existing terminology is extracted as key words and displayed with “left text context” and “right text context.” The screen shot below shows a typical example:
One example of ambiguous terminology that could use a replacement is “excessive FORCE.” This begs the question, “what amount of pressure is excessive?” Although the meaning of the original text may seem obvious to an engineer or a seasoned groups of native English speakers who have read several versions of previous documentation, this type of terminology would be extremely difficult to translate with accuracy.
In addition, the true meaning in English is widely open to interpretation, which could have legal ramifications for a failure with aircraft or similar hardware.
The essential human factor in text minining
As John Smart made clear, although Text Miner does an admirable job of identifying most of your terminology “automatically,” carefully qualified staff members are required to ensure that desired results are achieved.
You will need to designate a “Head Text Miner” who has the following qualifications:
- Subject Matter Expert
- Good command of English (strong English as second language is acceptable)
- Experienced in a role as an agent of change
- Understands common nuances in English
- Ideally, staff member is Multilingual
- A “global thinker” would be a perfect match to this role
The goal: one word, one meaning
A huge number of historic events and trends led to English becoming one of the most expressive languages on the planet. By its nature, English often offers a dozens ways (or words) to “say the same thing.” Of course, different words or phrases have different nuances and connotations.
The goal in effective technical communications for a global audience is whenever possible have one primary word with one meaning. The illustration shows “Simplified Spanish” that resulted from a clean terminology based after data mining. Because there was one English word for “electrical,” simplified choices in Spanish were the results.
Review and Refine terminology
The webinar gave sensible guidelines for creating a review team to ensure that correct terminology is saved or rejected. Recommended steps include:
- Consult your subject matter experts
- Form a small committee
- Ask for legal advice (e.g. consult your corporate legal team to determine what terms or words have led to litigation in the past)
- Visit your help desk staff to determine which words cause confusion
- Check your source t4exts
- Use the “hit” counter
- Look at instrument marks and lables
- Write a text text
Using less text with the right words
Several of our blogs in the past have touched on Simplified English and other tools that can be used to “reshape content for the small screen” to achieve better content for mobile devices. Text mining to refine your terminology can be equally essential to ensure that your message is crystal clear, in all languages, and to avoid legal issues due to missing cautions or warnings.