Tag Archive for: Spark

Ironside experts Dan Gouveia and Chi Shu recently got the chance to share knowledge with our local analytics community at the Scalable R Analytics Meetup in Cambridge, MA. Presenting to a packed room, Chi and Dan dove into several ways that R’s powerful data science capabilities could be scaled to apply to much larger, enterprise-level data sets. They demonstrated how to achieve this scalability both with dashDB, a cloud data warehouse, and Spark, a big data-oriented parallel processing framework. Read more

Data discovery is a “new” technique that takes a less formal and more agile approach to analyzing data. Okay, well, it’s not really new — people have been doing this with spreadsheets for decades — but the products that support it have improved greatly and have forced a more formal consideration of these techniques. The data discovery approach produces insights very quickly, but it also encounters challenges when dealing with data transformation. Most data discovery tools are limited in their ability to manipulate data. Additionally, understanding relationships between different data entities can require expertise that some users may not possess. In order to enable agile data discovery, organizations need agile data warehousing. Read more

The immense amount of data being collected today, in any industry, expands the reality of advanced analytics and data science. In concept, it creates an explosion of opportunities and expands what can be accomplished. In reality, we are often limited in scope by our data processing systems which may not be able to handle the complexity and quantity of data available to us. The introduction of Spark has offered a solution to this issue with a cluster computing platform that outperforms Hadoop. Its Resilient Distributed Dataset (RDD) allows for parallel processing on a distributed collection of objects enhancing the speed of data processing. For this reason, Spark has received a lot of interest and promotion in the world of big data and advanced analytics, and for good reason. Read more

Don’t think you have big data?  Chances are you do.  The fact is if you have a website, you have big data.  Web servers capture and store events related to user traffic.  The web logs they generate essentially tell the story of what users did when they visited your site.  This information can provide your organization with extreme business value.

If you think you have big data to analyze from your website, you may want to look into Apache Spark.  It’s easy to get started with and makes short work of analyzing your web logs.  It’s actually pretty fun to work with, too.  If you enjoyed playing with LEGOs as a kid, you may have a childhood flashback with Spark. Read more