Posts

When it comes to AI and automated machine learning, more data is good — location data is even better.

At Data Con LA 2019, I had the pleasure of co-presenting a tutorial session with Pitney Bowes Technical Director Dan Kernaghan. We told an audience of data analysts and budding data scientists about the evolution of location data for big data and how location intelligence can add significant and new value to a wide range of data science and machine learning business use cases.

Speeding model runs by using pre-processed data

What Pitney Bowes has done is take care of the heavy lifting of processing GIS-based data so that comes ready to be used with machine learning algorithms. Through a process called reverse geocoding, locations expressed as latitude/longitude are converted to addresses, dramatically reducing the time it takes to prepare the data for analysis.

With this approach, each address is then associated with a unique and persistent identifier, the pbKey™, and put into a plain text file along with 9,100 attributes associated with that address. Depending on your use case, then, you can enrich your analysis with subsets of this information, such as crime data, fire or flood risk, building details, mortgage information, and demographics like median household income, age or purchasing power.  

Surfacing predictors of summer rental demand: location-based attributes

For Data Con LA, we designed a use case that we could enrich with location data: a machine learning model to predict summer revenue for a fictional rental property in Boston. We started with “first person” data on 1,070 rental listings in greater Boston that we sourced from an online property booking service. That data included attributes about the properties themselves (type, number of bathrooms/bedrooms, text description, etc.), the hosts, and summer booking history.

Then we layered in location data from Pitney Bowes for each rental property, based on its address: distance to nearest public transit, geodemographics (CAMEO), financial stress of city block, population of city block, and the like.

Not surprisingly, the previous year’s summer booking and scores based on the description ranked as the most important features of a property. However, it was unexpected that distance to the nearest airport ranked third in importance. Other location-based features that surfaced as important predictors of summer demand included distance to Amtrak stations, highway exits and MBTA stations; block population and density measures; and block socio-economic measures.

By adding location data to our model, we increased the accuracy of our prediction of how frequently “our” property would be rented. Predicting that future is an important outcome, but more important is determining what we can do to change future results. In this scenario, we can change the price, for example, and rerun the model until we find the combination of price and number of days rented that we need to meet our revenue objective.

Building effective use cases for data science

As a Business Partner since 2015, Ironside Group often incorporates Pitney Bowes data — both pbKey flat-file data and traditional GIS-based datasets like geofences — into customized data science solutions built to help companies grow revenue, maximize efficiency, or understand and minimize risk. Here are some examples of use cases that incorporate some element of location-based data into the model design.

Retail loss prevention. A retailer wanting to analyze shortages, cash loss and safety risks expected that store location would be a strong predictor of losses or credit card fraud. However, models using historical store data and third-party crime risk data found that crime in the area was not a predictor of losses. Instead, the degree of manager training in loss prevention was the most significant predictor — a finding that influenced both store location decisions and investments in employee training programs.

Predictive policing. A city police department wanted to a data-driven, data science-based approach to complementing its fledgling “hot spot” policing system. The solution leverages historical crime incident data combined with weather data to produce an accurate crime forecast for each patrol shift. Patrol officers are deployed in real time to “hot spots” via a map-based mobile app. Over a 20-week study, the department saw a 43% reduction in targeted crime types.

Maximize efficiencies for utilities demand forecasting. A large natural gas and electricity utilities provider needed a better way to anticipate demand in different areas of their network to avoid supply problems and service gaps. The predictive analytics platform developed for the utility uses cleaned and transformed first-party data from over 40 different geographic points of delivery, enriched with geographic and weather data to improve the model’s predictions of demand. The result is a forecasting platform that triggers alerts automatically and allows proactive energy supply adjustments based on predictive trends.

About Ironside Group and Pitney Bowes

Ironside Group was founded in 1999 as an enterprise data and analytics solution provider and system integrator. Our data science practice is built on helping clients to organize, enrich, report and predict outcomes with data. Our partnership and collaboration with Pitney Bowes lead to client successes as we combine our use case-based approach to data science with Pitney Bowes data sets and tools.

The day-to-day work of an Underwriter ranges from research, to data entry, to pricing a risk, to ultimately negotiating that premium value with an agent. At the core, they need to accurately gauge risk, on a case by case basis. But their job doesn’t stop there. Even if we were to codify all the significant risk factors (as actuarial tables do), this doesn’t translate directly to how much the insurance firm ultimately charges for a given premium. Underwriters need to create an offer that they can justify to their customers, and keep an eye on the prevailing market dynamics.

Read more

For players in the biopharmaceutical space, it is becoming increasingly clear that advanced analytics can be of enormous assistance in solving many of the unique challenges the industry faces. To understand the extent of the impact that advanced analytics can make, it’s first necessary to examine how healthcare in the US has undergone a major transformation over the past decade.

First, there’s the presence of managed care. It puts pressure on pharmaceutical companies to provide stronger evidence of efficacy and safety, reduce costs of drug development and healthcare in general, and provide personalized care by targeting patient groups that are most likely to benefit from treatments and least likely to suffer adverse events. Read more

According to Dave Chaffey’s 2016 global social media research summary, over 2.3 billion people actively use websites like Facebook, Snapchat, Twitter, and LinkedIn to view content or engage with other users . This massive audience represents a huge opportunity for organizations to understand what trends they can connect with, who their ideal customers are, and what the sentiment is around their brand in the marketplace. These insights all become possible through social media analytics. Read more

Ironside experts Dan Gouveia and Chi Shu recently got the chance to share knowledge with our local analytics community at the Scalable R Analytics Meetup in Cambridge, MA. Presenting to a packed room, Chi and Dan dove into several ways that R’s powerful data science capabilities could be scaled to apply to much larger, enterprise-level data sets. They demonstrated how to achieve this scalability both with dashDB, a cloud data warehouse, and Spark, a big data-oriented parallel processing framework. Read more

Many of you have heard buzzwords such as “data science,” “big data,” or the “Internet of Things” before. You’re able to piece together that these fields relate to each other and deal with analyzing data in some way, but maybe you’re not so sure what these terms really mean. That’s what I’m here to help with.  As a newer member of the data science field, I developed this short data science guide based on my experiences and perspectives in an effort to help those who are just starting out. Read more

Ironside CEO Tim Kreytak was featured as part of Japanese news outlet BCN’s June 13, 2016 article recapping the IBM Watson Summit that took place in Tokyo on May 24-26. Tim (shown bottom right in the following photo) was a guest speaker at the conference, where he talked about Ironside’s successes as an IBM partner, specifically discussing our predictive policing work with the Manchester, NH Police Department.

 

Tim Kreytak in BCN News

“Tim Kreytak, CEO of Ironside, introduced their success with Manchester Police Department in reducing the number of crimes by 26% in one year,” the article stated (translated from Japanese). It then went on to talk about Ironside’s partnership with IBM and the measures by which we judge our clients’ success, as well as some additional case examples around Watson Analytics.

 

IBM Watson Summit in Tokyo

 

IBM Watson Summit in Tokyo

Solution Shared at IBM Watson Summit in Tokyo

Ironside’s IronShield predictive policing platform gives police departments the tools and knowledge they need to anticipate criminal behavior and proactively prevent crime, which is one of the largest factors in crime reduction. It utilizes powerful predictive models tailored for each community’s unique geography, weather, and history right from the start through its core module for hot spot policing, producing intuitive visual map displays of where crime is most likely to occur.

If you’d like to check out the solution featured at the conference, take a look at our IronShield page.  Or you can read about it in local news coverage at WMUR9.

You can see the article about the IBM Watson Summit in Tokyo on BCN Bizline.

 

The immense amount of data being collected today, in any industry, expands the reality of advanced analytics and data science. In concept, it creates an explosion of opportunities and expands what can be accomplished. In reality, we are often limited in scope by our data processing systems which may not be able to handle the complexity and quantity of data available to us. The introduction of Spark has offered a solution to this issue with a cluster computing platform that outperforms Hadoop. Its Resilient Distributed Dataset (RDD) allows for parallel processing on a distributed collection of objects enhancing the speed of data processing. For this reason, Spark has received a lot of interest and promotion in the world of big data and advanced analytics, and for good reason. Read more

Policing in the United States and around the world is rapidly changing.  Just as there have been paradigm shifts in law enforcement procedures in the past, we are now on the brink of another transformation of how communities are policed.  Current national narratives and recent events are motivating these changes, and like it or not, a new era of law enforcement is upon us.  One of the main solutions that helps law enforcement adapt to this change is adopting a sound data driven policing strategy. Read more

For the second year in a row, Ironside has been named one of IBM’s Beacon Award finalists, this time in the category of Outstanding IBM Analytics Line-of-Business Solution. This recognition comes in honor of the compelling results that our IronShield predictive policing platform has generated.

About IronShield

IronShield Ironside predictive policing logo

IronShield provides turnkey predictive hot spots policing and analytics for law enforcement. It enables data-driven, evidence-based policing that stops crime before it happens and is customizable to the environment in which it’s implemented, going beyond its initial hot spots module to target each community’s needs. Our CEO Tim Kreytak recently highlighted the impact IronShield has had in Manchester, NH helping the city’s police department combat the heroin crisis. Read more

Portfolio Items