Leveraging AI and Machine Learning for Product Matching
There is a vast number of products sold online through various outlets all over the world. Identifying, matching and cross-checking products for purposes such as price comparison becomes a challenge as there are no global unique identifiers.
There are many situations where accurately identifying a product match is essential. For instance, stores may want to compare competitor prices for the same product they may offer. Similarly, customers may use comparison tools to get the best deals. Amazon allows different sellers to offer the same products only after ensuring that they are the same before listing the sellers in a single, unique product page.
Numerous products but no method to match them across different stores
Product titles/descriptions do not have a standardized format. Each store, as well as different sellers within a store, might have different titles and descriptions for the same products. Another challenge comes in with respect to attribute listings as different e-tailers follow different formats. The product images of the same product also differ across different e-tailers.
While there are standardized unique identifiers like UPC, MPN, GTIN, etc, they, however, may not be mentioned in the product page in all stores selling them. The attributes themselves may be described differently - for instance 9" and 9 inches. Images may be included but they can differ in perspective, clarity, tone, etc. The brand name may also be referred to in different ways like GE and General Electric.
It is an impossible task for a human to visit different product pages to ensure if they are matching the same products. Although, if the process is to be automated, how can it be ensured that the system makes sense of all the information. This is when AI and machine learning come into the picture.
Machine Learning for Product Matching
In machine learning solutions for product matching, the solution provider must initially build a database with billions of products. This can be done by collecting information through web crawls and feeds. The system then has to come up with a universal taxonomy. This especially is a unique challenge as different retailers use different classifications for their products, and the same product might be listed in more than one category. For instance, a particular shoe model might be listed under casual shoes as well as dress shoes. The system first must design a standardized taxonomy, irrespective of how a particular store classifies its products.
There are standard classification models such as Google Taxonomy, GS1, and Amazon but a product match solution may devise its own taxonomy. The universal taxonomy is designed by identifying patterns and signals from titles, product descriptions and attributes, and from images.
Once a universal taxonomy is in place, the next step is making particular product matches. Here, there is a need for precise comparisons to ensure a particular product is indeed the same unique product, despite the differences in titles, images, descriptions, etc. First, there is a search for unique identifiers such as UPC or GTIN on the product page. Then, the product titles need to be compared. It needs to be noted that no two product titles are the same across different stores for the same product, for example:
Neural networks play a key role
Neural networks and deep learning techniques are extensively used to identify and learn from similarities, to identify and learn from differences, and produce word-level embedding to create a system of representation for common words. This involves teaching the system to recognize different references to a unique entity such as 'GE' and General Electric or 7" or 7 inches, to come up with one unique representation for each entity.
A product can be identified using its title, description, images and attributes or its specifications list. In many cases, the product title itself will yield a lot of information and the system needs to be trained to differentiate the product name (for instance, brand model) from the attributes.
<Phone model images>Samsung Galaxy Note 8 (US Version) Factory Unlocked Phone 64GB – Midnight Black (Certified Refurbished)Samsung Galaxy Note 8 is the phone model, and the title provides additional information like the memory size, US version, Factory Unlocked Refurbished, etc.
Identifying and sorting product matches
The information then needs to be extracted and sorted into the appropriate slots - Phone model, version, memory size, etc. Different techniques might be used to help the system learn to parse and sort the different sets of information.
The next comparison comes in the form of more information about the product such as the title, description containing additional information and a specs table. These help add more knowledge about the product, and the machine will be better able to identify an exact product match or mismatch in the following comparison.
The standard identifying signals are similar results or positive matches for unique identification numbers (UPC or MPN), classification, brand, title, attributes, and image. For each comparison, the system follows a long procedure of checks or safety valves. The checks pass through a search for the unique identification number, a test for keyword similarities, brand normalization and match (for example, HP is the same as Hewlett Packard), attribute normalization and match ( 9 inches is the same as 9in, 9"), image matching, etc. There is also a check for variation in attributes such as:
For the best product match result, there has to be at least 99% of positive results. It will be considered a mismatch, even if it is a variation of what is essentially the same product. Different product match solutions employ different techniques and training methods, and it is a complicated process. Although, there is an advantage that neural networks and machine learning learn over time, and get better with each use.