When it comes to AI and automated machine learning, more data is good — location data is even better.

At Data Con LA 2019, I had the pleasure of co-presenting a tutorial session with Pitney Bowes Technical Director Dan Kernaghan. We told an audience of data analysts and budding data scientists about the evolution of location data for big data and how location intelligence can add significant and new value to a wide range of data science and machine learning business use cases.

Speeding model runs by using pre-processed data

What Pitney Bowes has done is take care of the heavy lifting of processing GIS-based data so that comes ready to be used with machine learning algorithms. Through a process called reverse geocoding, locations expressed as latitude/longitude are converted to addresses, dramatically reducing the time it takes to prepare the data for analysis.

With this approach, each address is then associated with a unique and persistent identifier, the pbKey™, and put into a plain text file along with 9,100 attributes associated with that address. Depending on your use case, then, you can enrich your analysis with subsets of this information, such as crime data, fire or flood risk, building details, mortgage information, and demographics like median household income, age or purchasing power.  

Surfacing predictors of summer rental demand: location-based attributes

For Data Con LA, we designed a use case that we could enrich with location data: a machine learning model to predict summer revenue for a fictional rental property in Boston. We started with “first person” data on 1,070 rental listings in greater Boston that we sourced from an online property booking service. That data included attributes about the properties themselves (type, number of bathrooms/bedrooms, text description, etc.), the hosts, and summer booking history.

Then we layered in location data from Pitney Bowes for each rental property, based on its address: distance to nearest public transit, geodemographics (CAMEO), financial stress of city block, population of city block, and the like.

Not surprisingly, the previous year’s summer booking and scores based on the description ranked as the most important features of a property. However, it was unexpected that distance to the nearest airport ranked third in importance. Other location-based features that surfaced as important predictors of summer demand included distance to Amtrak stations, highway exits and MBTA stations; block population and density measures; and block socio-economic measures.

By adding location data to our model, we increased the accuracy of our prediction of how frequently “our” property would be rented. Predicting that future is an important outcome, but more important is determining what we can do to change future results. In this scenario, we can change the price, for example, and rerun the model until we find the combination of price and number of days rented that we need to meet our revenue objective.

Building effective use cases for data science

As a Business Partner since 2015, Ironside Group often incorporates Pitney Bowes data — both pbKey flat-file data and traditional GIS-based datasets like geofences — into customized data science solutions built to help companies grow revenue, maximize efficiency, or understand and minimize risk. Here are some examples of use cases that incorporate some element of location-based data into the model design.

Retail loss prevention. A retailer wanting to analyze shortages, cash loss and safety risks expected that store location would be a strong predictor of losses or credit card fraud. However, models using historical store data and third-party crime risk data found that crime in the area was not a predictor of losses. Instead, the degree of manager training in loss prevention was the most significant predictor — a finding that influenced both store location decisions and investments in employee training programs.

Predictive policing. A city police department wanted to a data-driven, data science-based approach to complementing its fledgling “hot spot” policing system. The solution leverages historical crime incident data combined with weather data to produce an accurate crime forecast for each patrol shift. Patrol officers are deployed in real time to “hot spots” via a map-based mobile app. Over a 20-week study, the department saw a 43% reduction in targeted crime types.

Maximize efficiencies for utilities demand forecasting. A large natural gas and electricity utilities provider needed a better way to anticipate demand in different areas of their network to avoid supply problems and service gaps. The predictive analytics platform developed for the utility uses cleaned and transformed first-party data from over 40 different geographic points of delivery, enriched with geographic and weather data to improve the model’s predictions of demand. The result is a forecasting platform that triggers alerts automatically and allows proactive energy supply adjustments based on predictive trends.

About Ironside Group and Pitney Bowes

Ironside Group was founded in 1999 as an enterprise data and analytics solution provider and system integrator. Our data science practice is built on helping clients to organize, enrich, report and predict outcomes with data. Our partnership and collaboration with Pitney Bowes lead to client successes as we combine our use case-based approach to data science with Pitney Bowes data sets and tools.

In today’s “Big Data” era, a lot of data, in volume and variety, is being continuously generated across various channels within an enterprise and in the Cloud. To drive exploratory analysis and make accurate predictions, we need to connect, collate, and consume all of this data to make clean, consistent data easily and quickly available to analysts and data scientists.

Read more

Any journey requires a few things before getting started. Wandering through the forest can be a very pleasant experience, but if you don’t plan ahead and bring your compass and map, what happens if you get lost? (I know, you probably brought your smart phone, which has GPS. But then you find there is no signal, way out here in the forest…). Before starting an adventure like this, you need to prepare and make sure you are ready for any obstacles or unknowns that could occur.

Read more

When you think about the different ways that data gets used in your company, what comes to mind?

You surely have some executive dashboards, and some quarterly reports. There might be a reporting portal containing everything that IT created for anyone within the past decade.

Read more

Customer segmentation is defined as “the process of dividing customers into groups based on common characteristics so companies can market to each group effectively and appropriately.” By using the correct attributes to define the customer segment, it allows companies to identify the right customers for targeted and relevant offers. Those who successfully define and maintain customer segmentation can derive a competitive advantage from the implementation by improving customer experience.

Read more

With the maturing and ever increasing acceptance of the cloud across multiple industries and the data gravity gradually moving to the cloud, i.e. more data being generated in the cloud, we are seeing some interesting cloud-based data and analytics platforms offering unique capabilities. Some of these platforms could be disruptive to the established market leaders with their innovative thinking and ground up design that is “born in the cloud and for the cloud.”

Read more

Data democratization is the ability of an organization to provide information to end users in an easy and effective way. The goal is to provide self-service of information to end users with minimal IT support. There are many things that can go wrong when rolling out data democratization projects. The purpose of this article is to identify potential issues and provide guidance on how to avoid them in the democratization process.

Read more

When asked “What’s your data strategy?” do you reply “We’re getting Hadoop…” or “We just hired a data scientist…” or “If we only had a data lake, all our problems would be solved…”? Plotting a good data strategy requires more than buying a tool, hiring a resource, or adding a component to your architecture. You need something to describe:

  • the goals you are trying to achieve,
  • the stakeholders you are trying to serve, and
  • the internal capabilities required to satisfy those stakeholders and achieve those goals

Read more

Is your company suffering from a case of “Bad Data”? Everyone is following the process and doing their job correctly but you still face issues with accurate reporting, operational errors, audit anxiety about your data, etc. Good data should be a given, right?

Well it’s not that easy. In today’s business environment, rapid growth, organizational change, and mergers and acquisitions (M&A) are very difficult to absorb within a fragmented data ecosystem. Multiple disparate IT systems, siloed databases, and deficient master data often result in data which is fragmented, duplicated and out of date.
Read more

Any discussion of Master Data Management automatically includes a discussion of Data Governance. The two go hand in hand. Successful MDM implementations require understanding data ownership, stewardship, and security, as well as determining business rules to be applied to the data. Specific business rules usually include rules for matching and consolidating data items as well as data quality checks. Read more