speed data sets

What is the most common, most vital need of any business? Speed.

Speed to adapt, to respond, to evolve. It is important, not just at the big picture level, but on a daily basis as well. If you’re constantly waiting for information, then you’re spending less time analyzing data and making decisions that help the organization. Speed has been vitally important in the acceptance of Mode 2 style content such as Dashboards and Stories. Response time is key when viewing daily high-level metrics or when creating that “single use” asset to analyze a potential issue.

But what about those legacy Mode 1 reports and dashboards, or reporting “applications” that were crafted in Report Studio or the new Report Authoring environment? These assets often have complex prompting, allowing end-users the flexibility to tailor the results to their needs. How can we shorten up the wait time for the business users accessing that content? How do you increase the speed?

As part of a recent Cognos Health Check, we had the opportunity to leverage Cognos Data Sets to solve this very common challenge.

A Bit of History

With the release of Cognos Analytics 11.0.4 in September 2016, IBM introduced Data Sets. A Data Set is a cached collection of data items that can be used to increase performance, reduce workload on the database, and/or keep a “point in time” view of the data. They can be created from Data Modules as well as Framework Manager Packages. They are stored in a columnar format (specifically Apache Parquet) and by default, are stored in the content store. If the size of the content store is a concern, these files can be stored in an external object store instead. Data Sets can be scheduled so they can be refreshed, keeping the underlying data up to date.

Solution

IBM Cognos Analytics Data Sets gives a report author the ability to create a cached version of prompt data, improving the response time for end users of the report.

The process is simple:

  • Expand the options for a Framework Manager Model package or a Data Module.
  • Select Create Data Set.
  • Select the data items that you would like to use in the Data Set.

Note that a Data Set can contain purely dimensional values, such as office locations, employees, or products. They can also be more complex and contain values from multiple dimensions and/or fact data.

For example, there is a requirement for a prompt that displays Agents that have sold a minimum amount of a product line to small market customers. That query would most likely, at a minimum be across two different dimension tables and a fact – not something you would want to wait for each time a prompt object is rendered. A Data Set allows the author to run that query on a scheduled basis and cache the result set. The Data Set query can be filtered to limit the data and sorted, or that can be handled at run time from within the report. Data Sets can be defined to return detail-level data, or can be summarized, which will suppress duplicates.

Results

In our recent engagement, sample reports were created against the GO Sales (query) sample data and package – a relatively small data set – and the Include Performance Details option was turned on during the test runs to view the Query and Render times. We tested two (2) different scenarios, the first using basic value prompts, populated by simple use value/display value queries. The sample report contained three (3) prompts, a Radio Button Group based on Products (144 rows), a Check Box Group based on Retailers (562 rows), and a List Box based on Sales Staff (765 rows).

data sets

Our testing approach was designed to isolate any impact that database and/or application level caching might have on the results. This was done to mimic real world use, where caching may not have a large beneficial impact when reports are run over time or by multiple users. Over eight (8) test runs of the report, the Data Set queries were over 150% faster than the Database query prompts!

The second scenario used Search and Select prompts and the test was to search both for each letter of the alphabet. The prompts were populated by Sales Staff that had Sales Revenue for any Retailer Type except Direct Marketing. The resulting query only returned 179 rows of data and yet, the Data Set version of the Search and Select prompt was 86% faster than the Database version. The only time the Database version returned more quickly than the Data Set version was when there was no data returned, and in those three (3) cases, the Database query was 7.5% faster. Remove those from the overall population and the Data Set versions were 98% faster than the Database.

Of course, your mileage may vary. Response time can be affected by a number of factors, such as the database environment, network, Cognos infrastructure, and report complexity.

Result Sets – First Scenario

In the tables below, note that the order of the prompt objects on the page were switched after the first four (4) test runs. Once the Database connection is established, the Database times are more in line with those of the Data Sets, but the initial prompt object response time for the Data Set is vastly superior to that of the Database.

data sets
data sets

Result Sets – Second Scenario

In the table below, the Search and Select queries pulled data from the Sales Staff query subject, and filtered on the Sales fact query subject (Revenue > 0) and the Retailer Type query subject (Retailer type <> ‘Direct Marketing’).

data sets

Takeaways

There have been a number of techniques available in Cognos to speed up the load and response time of prompt pages. Some of them are built into the product, such as the Use Local Cache and Use for Parameter Info query properties. Some are techniques combining external content with built-in features, such as using the External Data capabilities to populate prompt queries or modeling flat files or XML data sources. None of these provides the flexibility, stability, and performance of Data Sets. Once they are defined, data sets can be refreshed based on a schedule and are upgradeable, as they are stored in the Cognos content store by default. Data Sets can increase the speed of your prompts without increasing the burden on your developers or report consumers.

health check cta

About Ironside

Ironside was founded in 1999 as an enterprise data and analytics solution provider and system integrator. Our clients hire us to acquire, enrich and measure their data so they can make smarter, better decisions about their business. No matter your industry or specific business challenges, Ironside has the experience, perspective and agility to help transform your analytic environment.