Data Blog by Lizeo

Impacts of Dirty Data in your competitor price analysis

Impacts of Dirty Data in your competitor price analysis

There is an exponential increase in data volumes regarding online and offline competitor market prices available for businesses every day. 
These price data are characterised by heterogeneous formats (due to a disparate price data display on the different digital platforms or in the way they were collected) which is complicating the task of providing clean, uniformed and matched competitor pricing data to the pricing teams for analysis. These data in heterogeneous formats are called dirty competitor price data.
At this stage, there is no point in developing your price intelligence over this data unless spending a tremendous amount of time trying to make them talk. 
Let’s take a look at Dirty competitor price data in the context of the Tyre Industry and its impacts.

What is Dirty Data? ​

Dirty data is a general expression defining data that is inaccurate, incorrect, inconsistent, duplicated, incomplete or violating business rules.

Below is dressed a list of the 6 most common dirty data with illustrated examples applied on competitor price data for the Tyre Industry:

Incomplete data​

This is quite easy to understand. Incomplete data has missing fields or values that are necessary and mandatory to be able to run your pricing process. In the Tyre industry, technical attributes of a tyre such as Load Index, Speed Index or OE Marking have a strong influence on the price. Example: 205/55 R16 91V and a 205/55R16 94H are 2 different tyres (different Load Index and Speed Index) with 2 different prices. If your dataset of competitor prices miss these fields, your pricing analysis will be wrong.

Duplicate data

Duplicate data might be one of the most popular dirty data examples.  Most companies deal with this issue with duplicate customer records in their CRM, duplicate products in their Master Data Management system or in their ERP. In the analysis of online tyre sell-out prices, duplicate data can slow down the efficiency of the pricing analyst: 2 tyres are looking different due to a misspelling but are in reality the same (Michelin Pilot Sport 4 and Mich PS4). Aggregation is then necessary to ‘attach’ these tyre to the same price line.

Incorrect data​

Incorrect data can be defined as field values that are outside of the valid range of values. In the Tyre Industry, this could be illustrated with a tire size (geobox) that does not exist: 195/25 R23.

Inaccurate data​​

The definition of accurate data can be summarised by the following question: Does the data accurately represent the scope you defined in the first steps of the price intelligence requirement definition? Data can be intrinsically correct but inaccurate given the scope of business context. An extreme example could be to perform price analysis of Nordic tyre including in the scope data coming from Spanish websites.

Business rule violations​

Business rules are essential to turn ‘standard’ competitor price data into your vision of the business and market. These data are specific to the industry, the business process and context. For tyres, the season is critical to perform accurate market price analysis. Mixing summer tyre price data and winter tyre price data would be a business rule violation.

Inconsistent data

Data Consistency can be defined as a stable definition of the data and/or the field values over time. In another word, data is produced regularly within a regulated and predictable framework. For a tyre, the way the dimension is displayed online is a good example: 205/55/R16 or 205-55-R 16 or 20555R16. This can lead to inconsistent data in your database during data collection.

What are the impacts of dirty data on your competitive price monitoring?

According to Gartner’s Data Quality Market Survey in 2017, the cost of dirty Data for companies is estimated at 15M$/year on average.

That cost may be underestimated as the survey was mainly directed towards Marketing departments, who are huge data consumers but not the only ones. 
For pricing teams, the impact of dirty data is not only at the competitor price analysis step but also during the whole pricing journey.

Impacts of Dirty Data in your Pricing Process

Beyond the extra time spent to clean competitor price data, there are direct impacts of Dirty Data on your pricing journey.
In the Tyre Industry, inaccurate market price analysis can happen due to a long list of issues in the tyre price data: 
  • a mix of tyres with and without OE Marking
  • duplicates: Michelin Pilot Sport 4 vs Mich. PS4
  • inconsistent price level: unit price mixed with group prices (basket of 1 or 2 tyres)
  • etc.
These dirty data have also a direct impact on internal projects such as delays in deploying a new process, a new tool or a new solution but can also affect the trust and credibility in current analytical tools (Business Intelligence).
For pricing tools and platforms, dirty data will turn data flow matching into a nightmare and pricing strategy implementation will not deliver the value expected: 
  • inexact matching between competitor price data with internal data (sell-in price data, sales volumes, etc.)
  • increase complexity in building tyre comparison panels to set the pricing rules strategy
In the end, the major cost will be a potential loss of revenue and market share due to a bad price setup.

Impacts of Dirty Data in your Pricing Data Science Project

Data Scientist is the dirtiest job of the 21st Century

According to a survey led by CrowdFlower, Data Scientists spends between 60% to 80% of their time cleaning dirty data…all before doing what they are good at: statistics, modelling, etc.

To assess a rough estimate of the cost of Dirty Data on Pricing Data Science projects, let’s do some simple math:

Yearly / Average cost of a Junior Data Scientist (according to Glassdoor): 200k$/year (estimation). Based on the fact that he/she spends 60% of the time cleaning data, it costs 120k$/year per Data Scientist.

And of course, there are hidden and indirect costs created by these dirty data
  • Delays in your Data Science Project and expected insights
  • A drop of motivation of your Data Science team
  • Impossible to run Machine Learning or AI tools

So, how to get rid of Dirty Data?

Without standard guidelines and processes to start and keep competitor price data clean, dirty data issues are bound to happen.
Productivity is lost when the pricing analysts waste their time checking the accuracy and reliability of the price data they are playing with to build up their analysis and extract insights for their management. The same issues occur with data scientists who are mostly occupied in cleaning, normalising, and preparing data before playing with statistical models or machine learning tools.
Concretely, the first step of your journey to get rid of Dirty Data is data cleansing.

If you want to know more about Data Cleansing:

In this article