Data Quality – How to measure for best results?

There is a constant flow of data coming in to be extracted, and decisions have to be made. Ensuring that the data is accurate can become cumbersome if certain rules are not put in place. Data quality is a delicate balancing act juggling between accuracy and completeness.
From an e-commerce perspective, Search Engine Algorithms need an understanding of the context behind each of the search terms that would inadvertently lead to better results and bag conversion rates. Attributes that are provided by the product content help search engines to understand the context as well as the intent of a consumer behind a  specific search term.
Many retail organizations manually add structured attributes using BPO companies or crowdsourcing firms like Amazon Mechanical Turk. Some large retail corporations engage their own associates to manually validate the quality of the data provided by their suppliers before the products are made available for customers.

How to effectively assess the data quality of product content?

A comprehensive yet concise data quality checklist helps in working towards analyzing the data. DAMA UK created an excellent guide on “data dimensions” that can be used to get a better picture of how data quality is determined.
There are 6 dimensions or steps that data quality can be determined from –

  • Completeness – If the data coverage across required fields available
  • Uniqueness – When measured against other data sets, there is only one entry of its kind.
  • Accuracy – How well does the data reflect the real-world person or thing that is identified by it? 
  • checkTimeliness – This could be previous sales, product launches or any information that is relied on over a period of time to be accurate.
  • checkValidity – Does the data conform to the respective standards set for it?
  • checkConsistency – How well does the data align with a preconceived pattern? 

In this article, we will measure data against 3 of these dimensions –


Improving data to produce the right information is the need of the hour. Accuracy states that the data is what it should be or it is a percentage measurement of how accurate the data is.
On using the search query – “floral cardigan” across 2-3 e-commerce sites, the search terms were classified individually wherein floral was classified as a pattern and cardigan as the clothing. This lead to incorrect results on the first page itself.
web page results for cardigans
Enhancing accuracy is a benefit to all, having more accurate attributes, as well as accurate values, enables search engines to produce better results. This eventually impacts and enhances the customer experience as well as bag conversions.


Completeness is the measure of whether data searched for exists or does not. It drives the attribute definition across all products as well as the coverage of attribute values across all available fields present in the product content.
For instance, if we have an attribute “pattern” for all products under product type skirts that contains 100k products, but pattern as an attribute is added only for 60k products, then “pattern” is only 60% filled in.
If we now perform a search for “striped cocktail dresses” at Macy’s, we could expect would look something like this  –
web page, cocktail dresses
However, these are the results that you get instead –
web page, cocktail dresses
Investing efforts in encapsulating key data sets as well as attribute values while loading products from product content from suppliers would reap rewarding results. The completeness of key attributes and their values is very important. Missing data can cost a lot of potential customers.


Consistency refers to data that requires to follow the same format for all the attributes in the product content which needs to match internal and external standards. Maintaining standard data formats across product types as well as attributes and values; keeping it in line with attribute names as well as the value label in accordance with external standards would help to show customers their desired products easily, without friction.
For instance, searching for “fruity perfumes for women” on Target results with these products and does not change even if another search query is given.  This gives the impression that the different scent attributes have not been cataloged.  Furthermore, the search query provides the same number of results despite the different search queries.
web page for perfumesweb page for perfumes
Values for scents provided on the product page are inconsistent and do not conform to external standards; although the “scents” may not be indexed for identification by the search engine.
There are certain attributes and values that would require consistency, which may be measured for reliability reasons. Measuring conformity against internal standards can be enhanced by ensuring the adherence to adding only valid labels that are verified and consistent with external standards. This would help to refer to the same product content available across retailers and define the list of standard attributes and their labels.
In conclusion, as more and more products along with the categories are being loaded onto platforms, attributes and their respective values are critical to understanding the customer’s purchase intent.
It is also essential to assess catalog data quality regularly so the three aspects of – consistency, completeness and accuracy – which help customers get better results and e-commerce portals can efficiently increase conversion rates.

Related Articles