Posts

In today’s “Big Data” era, a lot of data, in volume and variety, is being continuously generated across various channels within an enterprise and in the Cloud. To drive exploratory analysis and make accurate predictions, we need to connect, collate, and consume all of this data to make clean, consistent data easily and quickly available to analysts and data scientists.

Read more

According to ‘The Economist’, data is the new oil. It is now the world’s most valuable resource. The volume of data available to organizations to capture, store, and analyze has changed the ways in which organizations address innovation, and analytics is a true competitive differentiator.

Unfortunately, business analysts, data scientists, and other line of business users performing self-service analytics are spending a majority of their time preparing data for analysis rather than actually garnering and sharing the insights to be found in it (1), even with the help of self-service data prep tools like Alteryx, Trifacta, and Tableau’s Maestro (coming soon). Read more

The immense amount of data being collected today, in any industry, expands the reality of advanced analytics and data science. In concept, it creates an explosion of opportunities and expands what can be accomplished. In reality, we are often limited in scope by our data processing systems which may not be able to handle the complexity and quantity of data available to us. The introduction of Spark has offered a solution to this issue with a cluster computing platform that outperforms Hadoop. Its Resilient Distributed Dataset (RDD) allows for parallel processing on a distributed collection of objects enhancing the speed of data processing. For this reason, Spark has received a lot of interest and promotion in the world of big data and advanced analytics, and for good reason. Read more