text analytics inspecting document concept

Earlier this year, we published a newsletter article scratching the surface of the Text Analytics Premium add-on for IBM SPSS Modeler. It introduced the fundamentals of text analytics and the idea of extracting information from unstructured text data. In this follow-up article, we will dive more deeply into some of the more advanced topics to fully leverage the add-on’s functionalities.

You have probably noticed that the language resource templates against which you extract information from texts are essential in determining the performance of a text analysis. For example, you wouldn’t be able to identify any terms of interest for market research using a library of healthcare concepts. Therefore, we need to know how to revise the resources available so that SPSS can identify concepts, determine categories, and link patterns not only more accurately but also more relevantly.

We can find the resource editor needed to revise or create a new resource template by accessing an interactive workbench, as illustrated in our last article. An alternative way to do this is by clicking Tools > Text Analytics Template Editor on the main menu. Both approaches ask the user to open an existing resource template as a starting point. For the purpose of illustration, the examples in this article will use Basic Resources (English).

Revise Concept and Category Definitions

The most important function of any kind of text analysis is to identify key words or phrases that are of interest. SPSS does this by comparing the analyzed texts against the libraries in the language resource template or text analytics package you’ve selected. Each library contains type definitions, under which in different terms reside. In the resource editor make sure that the Library Resources tab is in view and select a library in the library list from the upper left panel of the editor window. You will notice that as soon as you select a library, the resources panel in the center gets populated with that library’s type and term definitions. To add more terms, simply right-click an existing term and select Add New Terms. You may also change the term definitions by using one of the options with a right-click.

Add new terms

If there are multiple terms with the same term name specified in the Term column, you may want to force a type to one of the terms by selecting Force Term Here. That way, when SPSS sees this term, it will ignore other occurrences of this term in other type definitions.

Change the Category Definitions

As introduced in our last article, SPSS Modeler Text Analytics can build categories off extracted concepts automatically based on which of the product’s robust set of automated techniques will put the concepts of the same key idea into higher-level groups. However, this automatic categorization may often not fit your needs so you may want to change or augment the built definitions of categories. Concept Patterns can be used to create or update categories, as well as create category rules for new or existing categories.

To create a new empty category, right-click the level in which you want to put the new category in the category pane. For example, if you want to create a new subcategory under an existing category, right-click that existing category or right-click All Documents to put it on the root level. In the right-click option menu, select Create Empty Category. You can then create a new category rule by right-clicking existing categories. You can also revise existing rules for categories by expanding a category and double-clicking the rule under it. Once you’ve chosen to either create a new or revise an existing rule, a new panel appears at the top of the workbench. That’s where you can type in the rules.

Interactive work bench

Suppose you want to analyze the reported cases in a police department and put the occurrence of concepts associated with missing people in a Missing People category. From all the case reports you want to analyze, you would first extract the location types, names, and words or phrases of missing concepts by using a template with these definitions or creating them in a template. You can then create a category rule defining the reporting of missing people in rule editor by typing: <Name> & <Location> & missing

Work with the Text Link Definitions

Text link definitions are used to detect the occurrence of different concept and auxiliary word combination patterns. These can be very useful in sentiment analysis when you want to detect customers’ sentiment trend or pattern toward your or your competitors’ products or services.

By switching to the Text Link Rules tab in the resource editor, you will have access to text link rule editor and navigation pane.

Text link rules
In the left navigation pane of text link rules, you can define two sets of rules: macros and text links.

Macros are groupings of types, other macros, and words with an OR operator (|), which can serve as a component of text link rules alongside types, concepts, and words. It significantly reduces the complexity of maintaining text link rules. This is because you can use those macros in multiple rules and if you need to make any changes to macros, you only need to do it once.

After defining the rule sets, right-click an item under Rules and choose Create Rule. A New Rule will be added to the corresponding level of the item you right-clicked. If you highlight the rule, you will see that rule editor window is populated with its current definition.

A new rule

In the Element list, you can specify the link pattern’s components and the quantity of each component. A rule’s matching elements can be macros, types, words, or even concepts. Note that both category and text link rules must follow some kind of format or syntax.

Conclusion: Next Steps with Text Analytics

Hopefully this article has provided a bit more technical instruction for you to make the most of the Text Analytics add-on. For further assistance or questions on utilizing the SPSS Modeler text analytics tool to boost your text analysis capabilities, check out the Ironside Data Science & Advanced Analytics team page.




Advanced Analytics eBook Download