watson content analytics person reading concept

IBM estimates that most businesses utilize about 20% of the data they collect, with the other 80% of a company’s stored information remaining under-utilized or inaccessible.

Today, businesses are storing more data than ever before. Data is vital for record keeping and to provide competitive advantage through reporting and analysis tools. So why the low utilization?

Traditional BI and analysis tools have been developed to work with structured data, which is information kept in a specific format or model that conforms to certain rules of access and use. Unstructured Data, the bulk of the information a company keeps, does not conform to these rules and often exists as free-form text. IBM Watson Content Analytics is a software tool that allows businesses to gain insight from both their structured and unstructured data.

BI Integration

Watson Content Analytics can be integrated with Cognos BI through the use of direct exports to a relational database system. Once the administrator finishes configuring an export, Content Analytics will provide an automatically generated star schema model for reporting. Reporting can be done through pre-configured Cognos BI reports in Watson Content Analytics or in reports and dashboards created from within Cognos.

Watson Content Analytics Architecture

Watson Content Analytics consists of six major components (Fig. 1). The Content Analytics Collection is an intermediary data store in the process and not considered a component.

Watson Content Analytics ComponentsFig. 1 – Component Architecture of Watson Content Analytics
Image courtesy of IBM Watson Content Analytics Redbook

Crawlers and the Document Processor

The Crawler is the component that actually goes out and collects the information to be analyzed. IBM Watson Content Analytics comes preconfigured with crawlers for many different types of content and data sources. Crawlers return documents that can be exported or sent to the document processor for the next stage in content analysis. The Document Processor does the work of adding structure by building an overlay of annotations into the documents. Annotations are descriptive tags that define aspects of the content. The annotations are created by Annotators, which are pre-built into Watson Content Analytics. Custom annotators can be built and used as long as they adhere to the Unstructured Information Management Architecture (UIMA) open source standard.

Indexer and Search Runtime

The Indexer takes analyzed documents from the document processor and builds a highly optimized index of them. After indexing, the data is stored as a Content Analytics Collection that can be exported for use in a relational database system or accessed by the search runtime component. The Search Runtime component serves user search requests directed to a content analytics collection.

Content Analytics Miner

The Content Analytics Miner is the browser-based interface where users can issue requests to the search runtime component and perform analysis. It provides a number of different views (Fig. 2) that allow a user to see correlation and deviation statistics for various aspects of their content. Analysis in the content analytics miner is typically performed through an iterative process of searching documents (Fig. 3), then narrowing down and analyzing the result set based on a certain common facet.

Content Analytics Miner
Fig. 2 – The Facet view (right) provides frequency and correlation data for keywords of specific facets, selected from the facet tree (left).
Image courtesy of IBM Watson Content Analytics Redbook

The Facet View

Fig. 3 – Searches within the Content Analytics Miner are created through expressions typed into the search box, shown above.
Image courtesy of IBM Watson Content Analytics Redbook

Administration Console

The Administration Console is the browser-based interface used for configuration and administration of the various components and processes within Watson Content Analytics. Tasks leading up to the creation of a content analytics collection, including crawling, document processing, and indexing, are directly controlled by an administrator through the administration console. General administration tasks, such as viewing system activity logs or configuring security, are also controlled through the administration console.

Use Cases

How are some companies making use of content analytics? Let’s review a couple possible use cases.

Customer Sentiment

Associations between companies or products and the language used by customers in social posts, updates, comments, etc. can be analyzed and reported on. New marketing campaigns, headlines in the news or media, and product releases may affect social sentiment toward a business in ways that organization needs to be aware of. This is far more effective than reviewing a periodic sales report, recognizing an anomaly and back-tracking, and then guessing at a reason for why it occurred.

Education

At the school administration level, there is potential for analysis of admission essays and applications to gain insight into the incoming class. Staff members can then take action with the correct resources, tools, and courses in an effort to provide a more customized education.

At the classroom level, analysis of essays and exams could potentially illuminate group opinion, understanding, and inconsistencies in a way that is currently not available to teachers. This would allow for a more tailored approach by instructors to each individual class and student.
Content Analytics is a still a relatively unexplored field for many businesses and as such provides many opportunities for competitive advantage. Content Analytics is catching on. Be ahead of the game. If you would like to learn more about Content Analytics or other emerging predictive analytics technologies, please contact us. The experts on our Data Science and Advanced Analytics team would be happy to help.




Advanced Analytics eBook Download