Tag Archives for " Deep Learning "

person using a tablet with a computer in the background

How AI helps to optimize e-commerce product content

How AI helps to optimize
e-commerce product content

With online sales growing faster and the e-commerce landscape changing with technological innovations, traditional retailers are increasingly investing in omnichannel strategies and doubling their efforts in order to meet consumer demands. An effective way to keep pace with e-commerce giants and stay relevant in the marketplace is to offer high-grade product discovery and selection. This requires providing detailed product content with product-specific attributes, along with semantic search.

The current product content problem

As more retail businesses are moving towards e-commerce, the need for quality information and powerful search platforms has become crucial in order to entice shoppers and help them make effective purchase decisions. However, this is a challenge as they are unable to easily deliver complete product content.

Retailers rely on the suppliers to provide all the coordinating images, videos, attributes, etc. for each of the products. Suppliers use various methods to provide content such as printed or digital catalogs or in different formats like Excel, PDF, etc., making it difficult for retailers to properly source and extract the right data required for the right product. In some cases, retailers even purchase content from third-party providers or online databases. However, the challenge here persists, as most of the time, content differs from suppliers to third-party providers and validation of the information becomes tedious.

Besides the price of a product, detailed product information along with superior quality-images, videos play an important role in a consumer’s buying decision. 

There are numerous technological challenges while extracting content from the product images - some including region segmentation, diverse product backgrounds, natural settings, typography and fonts, lighting conditions, and low-quality images. For instance, inconsistent product image sizes would limit the system to capture the product details completely from all the images.

Impact of poor quality data

Missing information and uncertainty are two leading factors for consumers to abandon their shopping journey. Consumers tend to leave their shopping journey when they sense that the product does not have clear or complete information. This could range from unclear product descriptions to missing or inaccurate product attributes such as size, materials used, ingredients, etc. or even product reviews.

While there is no definitive rule stating an optimal number of product images or videos or a recommended character limit for product information, the quality of product images and videos have a direct impact on the ability of the e-commerce business to generate sales. With complete and comprehensive product information (description along with attributes like size, or weight, etc.) and high-quality images and videos would enable shoppers with the information they may need to make a purchase decision.

Effective Extraction of Product Content

With IceCream Labs CatalogIQ, retailers can effectively address the problems they face while onboarding product content to their catalogs. Leveraging machine learning algorithms, Optical Character Recognition (OCR) systems, and Natural Language Processing (NLP) techniques, it can effectively extract the right information needed for the retailer to optimize their content as well as maintain their content health. Some of its capabilities include:

CatalogIQ extracting content from a product

Attribute Extraction: ​

Images would be clicked from all angles of the product and would be fed into the machine. Leveraging NLP techniques, brand attributes such as brand name, sub-brand, tagline, flavor, net weight/volume, and calorie information would be extracted.

Brand Name Detection (Logo detection): 

Leveraging OCR, the product image is scanned for text and the output is further sent to an NLP engine specifically to identify text logos (ex: for brand logos like Zara). If the text is not detected, image processing is further applied using the brand name parameters (ex: for brand logos like Nike)

Standard Certification Detection:

In this step, a preset database with standard food certification parameters is applied to detect and extract food certification labels such as “gluten-free”, “non-GMO”, “100% organic”. Here, the images are scanned using these parameters. This is similar to how the Brand Name detection functions.

nutritional label data extraction

Nutrition Facts Extraction:

Using OCR and region segmentation, nutritional facts text is extracted. This text is further corrected using a predefined vocabulary to streamline the content. A rule-based approach is then applied to the corrected text to extract nutritional values.

Product label images are a trusted source of product information for consumers. AI can ensure that the process would improve the quality of the information and maintain data consistency across all product pages. Retailers can further benefit from this as it would alleviate the burden of validating product data provided by various suppliers, online databases or third-party providers and can provide additional information that is critical for product discovery like brand or certification logo information.

The future of Product content

Applications leveraging AI and machine learning have projected tremendous potential for applying process automation to reduce data inconsistency and enhancing data quality and thereby, improving the product data extraction processes.


At IceCream Labs, we strive to address the challenges that businesses face in e-commerce using AI and machine learning. Are you ready to enhance your product content and take your e-commerce business to the next level? Reach out to us at sales@icecreamlabs.com for an AI-based solution for your business.


Related e-commerce articles - 

AI is Redefining Experience in Customer Support Centres
Businesses need to understand the complexities of individual transactions and customer behavior over multiple touch points and channels, now more[...]
Speech Analytics vs Voice Analytics: What is the difference?
Speech Analytics vs Voice AnalyticsBusinesses today have access to more consumer data than ever before, especially through their customer support[...]
Conversational AI – The next Step in E-commerce Evolution
Conversational AI - The next Step in E-commerce EvolutionThere is no doubt that AI is a popular buzzword in the[...]
Conversational AI: Getting Started
Conversational AI: Getting StartedWith the increasing list of benefits and a growing demand for voice interfaces, the retail space is[...]
Voice-enabled chatbots vs Messenger bots: What you need to know
  There are two distinct ways in which a conversational interface works: text conversations and voice. Consumers interact with chatbots[...]
AI Is The Best Present For Retailers This Holiday Shopping Season
Brands are undergoing massive digital transformations of their own in order to keep pace with the growing demands and expectations.[...]
convolution filter or matrix

3×3 convolution filters - A popular choice

In image processing, a kernel, convolution matrix, or mask is a small matrix. It is used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image.

In this article, here are some conventions that we are following —

  • We are specifically referring to 2D convolutions that are usually applied on 2 matrix objects such as images. These concepts also apply for 1D and 3D convolutions, but may not correlate directly.
  • While applying 2D convolutions like 3X3 convolutions on images, a 3X3 convolution filter, in general will always have a third dimension in size. This filter depends on (and is equal to) the number of channels of the input image. So, we apply a 3X3X1 convolution filter on gray-scale images (the number of channels = 1) whereas, we apply a 3X3X3 convolution filter on a colored image (the number of channels = 3).
  • We will refer to all the convolutions by their first two dimensions, irrespective of the channels. (We are observing the assumption of zero padding).

A convolution filter passes over all the pixels of the image in such a manner that, at a given time, we take 'dot product' of the convolution filter and the image pixels to get one final value output. We do this hoping that the weights (or values) in the convolution filter, when multiplied with corresponding image pixels, gives us a value that best represents those image pixels. We can think of each convolution filter as extracting some kind of feature from the image.

Therefore, convolutions are done usually keeping these two things in mind -

  • Most of the features in an image are usually local. Therefore, it makes sense to take few local pixels at once and apply convolutions.
  • Most of the features may be found in more than one place in an image. This means that it makes sense to use a single kernel all over the image, hoping to extract that feature in different parts of the image.

Now that we have convolution filter sizes as one of the hyper-parameters to choose from, the choice is can be made between smaller or larger filter size.

Here are the things to consider while choosing the size —

Smaller Filter Sizes

Larger Filter Sizes

We only look at very few pixels at a time. Therefore, there is a smaller receptive field per layer.

We look at lot of pixels at a time. Therefore, there is a larger receptive field per layer.

The features that would be extracted will be highly local and may not have a more general overview of the image. This helps capture smaller, complex features in the image. 

The features that would be extracted will be generic, and spread across the image. This helps capture the basic components in the image. 

The amount of information or features extracted will be vast, which can be further useful in later layers.

The amount of information or features extracted are considerably lesser (as the dimension of the next layer reduces greatly) and the amount of features we procure is greater.

In an extreme scenario, using a 1x1 convolution is like considering that each pixel can give us some useful feature independently. 

In an extreme scenario, if we use a convolution filter equal to the size of the image, we will have essentially converted a convolution to a fully connected layer. 

Here, we have better weight sharing, thanks to the smaller convolution size that is applied on the complete image. 

Here, we have poor weight sharing, due to the larger convolution size.

Now that you have a general idea about the extraction using different sizes, we will follow this up with an experiment convolution of 3X3 and 5X5 —

Smaller Filter Sizes

Larger Filter Sizes

If we apply 3x3 kernel twice to get a final value, we actually used (3x3 + 3x3) weights. So, with smaller kernel sizes, we get lower number of weights and more number of layers. 

If we apply 5x5 kernel once, we actually used 25 (5x5) weights. So, with larger kernel sizes, we get a higher number of weights but lower number of layers. 

Due to the lower number of weights, this is computationally efficient. 

Due to the higher number of weights, this is computationally expensive. 

Due to the larger number of layers, it learns complex, more non-linear features.

Due to the lower number of layers, it learns simpler non linear features.`

With more number of layers, it will have to keep each of those layers in the memory to perform backpropogation. This necessitates the need for larger storage memory. 

With lower number of layers, it will use less storage memory for backpropogation.

Based on the points listed in the above table and from experimentation, smaller kernel filter sizes are a popular choice over larger sizes.

Another question could be the preference for odd number filters or kernels over 2X2 or 4X4.

The explanation for that is that though we may use even sized filters, odd filters are preferable because if we were to consider the final output pixel (of next layer) that was obtained by convolving on the previous layer pixels, all the previous layer pixels would be symmetrically around the output pixel. Without this symmetry, we will have to account for distortions across the layers. This will happen due to the usage of an even sized kernel. Therefore, even sized kernel filters aren’t preferred.

1X1 is eliminated from the list as the features extracted from it would be fine grained and local, with no consideration for the neighboring pixels. Hence, 3X3 works in most cases, and it often is the popular choice.

Related AI articles

Transforming the Retail Customer Experience with In-store Analytics
Online retailers have the advantage of tracking cookies and web analytics tools to calibrate different aspects of an online shopping[...]
AI is Redefining Experience in Customer Support Centres
Businesses need to understand the complexities of individual transactions and customer behavior over multiple touch points and channels, now more[...]
Speech Analytics vs Voice Analytics: What is the difference?
Speech Analytics vs Voice AnalyticsBusinesses today have access to more consumer data than ever before, especially through their customer support[...]
Conversational AI – The next Step in E-commerce Evolution
Conversational AI - The next Step in E-commerce EvolutionThere is no doubt that AI is a popular buzzword in the[...]
Conversational AI: Getting Started
Conversational AI: Getting StartedWith the increasing list of benefits and a growing demand for voice interfaces, the retail space is[...]
Voice-enabled chatbots vs Messenger bots: What you need to know
  There are two distinct ways in which a conversational interface works: text conversations and voice. Consumers interact with chatbots[...]
wooden blocks

Capsule Network — Better approach for Deep Learning

Capsule Network — Better approach for Deep Learning

Deep learning is a concept of the machine learning family. It supports the statement “learning can supervised, semi-supervised or even unsupervised rather attaching oneself to the task-specific algorithms. These algorithms are then, used to define the complex relations like that of the human biological nervous network. The interpretation of deep learning is established on the concepts of information processing and communication patterns.

The architecture of Deep learning was introduced in the fields of computers and technology. It was then rapidly adapted and implemented with various areas such as in speech recognition, social network filtering, bioinformatics, drug design, and similar concepts which involve artificial intelligence. The results after implementing Deep learning were remarkable. The machines processed the desired results with better speed and accuracy than humans. There have also been cases where these machines have proven to be better than human experts.

Routing and networking has taken a new direction

Last year, Geoffrey Hinton, one of the Godfathers of deep learning released a paper — Dynamic Routing Between Capsules”. The paper compared to the current state-of-the-art Convolutional Neural Networks, the authors projected that the human brain have modules called “capsules”. These ‘capsules’ are adept at handling different types of visual stimulus like pose (position, size, orientation), deformation, velocity, hue, texture etc. The brain must have a mechanism for “routing” low level visual information to what it believes is the best capsule for handling it.

Convolutional Neural Networks (CNN) vs capsule network

The concept of Capsule network relates to the dynamic routing between capsules unlike in Convolutional Neural Network (CNN) wherein connectivity patterns are derived between neurons or stimuli.

The term capsule refers to a collection of neurons whose activity vector represents the visual field or the instance parameters of the object or the object part.

The key feature of CNN lies in the recommendation systems, image/ video recognition and natural language processing, etc. but with Capsnet, enabled by Deep learning methods designs images in an objectified manner rather than being a sketched version.

There is a strong correlation between the hidden layers of the entity or the distant object in CNN. The visualization of the object references is made better with the Capsule Network.

Capsule is a nested set of neural layers.

In a regular neural network, more layers can be added. In CapsNet the extra layers are added inside a single layer. The neural networks use minimal pre-processing as compared to other image classification algorithms. This states that the networking learning filters used in the traditional algorithms were predominantly hand-engineered.

CNN is what created a major change and proved to be an advantage to the human efforts in feature design. It also made it easier for humans who didn’t have prior knowledge about feature design.

With the introduction of the CapsNet, the task has become much simpler. Capsnet observes a strong correlation among the varied layers of the interface. This correlation takes a new dimension in the image processing techniques. The layers are arranged in functional pods which enable designers to distinguish between the various elements.

For instance, the approximate positions of the nose, mouth, eyes can be drawn using CNN but the exact alignment or the 3D structure can be fixed with CapsNet.

Image by Aurélien Géron

 

Practical Advantages of CapsNet in Deep learning

The probabilistic interpretation is derived from machine learning. Its features include optimizing the training and testing concepts. To be more specific, the probabilistic interpretation considers the non linearity activity as a cumulative distribution function. This concept led to the covering up glitches or dropouts in neural networks.

The capsules try to resolve the problem with max pooling levels through equivariance. This indicates that capsules make it viewpoint-equivariant, instead of making the function translation invariance. So as the function goes and changes its place in the picture, the function vector reflection will also alter in the same way which makes it equivariant.

sketchbook with a sketch of a robot

The core components of Artificial Intelligence

The core components of Artificial Intelligence

The massive surge in AI and the buzz surrounding it sometimes makes it difficult to get a handle on the technology and piece around AI. Here we make a small attempt to explain the core components, services and piece in AI.


Hardware

Graphic Processing Units or GPUs - Nvidia is the largest maker of GPUs had its niche in the gaming market. It had such a lasting impression that no gamer would want to be caught without it. This was until the discovery of Bitcoin and the heavy interest in blockchain. GPUs were no longer restricted to gaming. People relied on them to mine bitcoins.
With the current buzz regarding AI and the rush towards adapting it in their processes, people have realized the benefit of using GPUs for AI and deep learning. GPUs essentially are high end graphic cards that go on to regular servers. The cards as well as the extensive software stack make it easy to process large volumes of data in complex AI models. GPUs are expensive but they cut processing time from months to days. Now, you would never see a deep learning developer without a GPU. Nvidia from its end, has built an extensive software stack to support the use of these GPUs. Libraries such as Cuda are a great resource.

Frameworks

While the hardware is in place, the framework or libraries are the what the machine learning applications are built on. Matlab was widely used to experiment and the R programming was also heavily used.
When Python libraries became available, developers switched almost immediately. Scikit-Learn is the most popular python framework most developers use to do traditional machine learning.
The big push towards deep learning has lead to some great competing frameworks. Most recently, Tensorflow from Google has gained popularity as well as Pytorch. Some others in the same field are Kara’s, Caffe and Theao. Each frameworks have their own set of pros and cons.
Every framework has its pro and cons. The adoption is really driven by the community available for people get support since all of these are free open sourced and not commercially sold frameworks. Tensorflow is pushing the limits with awesome features to handle text, images etc and has been widely used within Google. Our teams here are currently in love with Pytorch.

APIs

All the big cloud providers such as Amazon Web Services (AWS), Google Web Platform or Microsoft Azure have a portfolio of AI or machine learning APIs offered as cloud services. They offer text classification, sentiment analysis, image classification etc. These services can be plugged into solve simple problems within a developers application. For instance, for a broad level search, you can use the sentiment analysis API to see if your customer feedback is negative or positive.

AI Applications

These are the application used by customers which leverage AI and solve specific problems. Siri and Alexa are the ideal examples. Another great example would be the recommendation engines that we see on Amazon, Walmart, etc. The most common applications can be seen in almost every car leveraging technology from MobileEye. It has features such as the lane assist, parking sensor, and pedestrian or obstacle detection systems.
Our teams at IceCream Labs have spent the better part of the last 18 months building applications focused on catalog management and merchandising for retailers and brands.

alexa

We leverage GPUs, Tensorflow, Pytorch, Keras and Caffe. We don’t use the standard cloud APIs as they are not sufficient for the problem we cover. Our application, CatalogIQ can intelligently score the quality of the product content and automatically classify products, generate keywords, improve content for SEO and search.

By leveraging secure private clouds, we can do this on from 1 product to 100 million products seamlessly. That is, really the power of AI.

paper robots

Pragmatic AI in the enterprise

The AI hype is so pervasive that most senior executives at any enterprise have AI in their radar. Every company today is thinking about AI or has some AI initiative in place. The current conversation on everybody’s mind right now is how to use AI and what more can be done. Most people treat AI as a black box, almost of a promised land. Expectations are big. Everyone has seen the power of Siri and Alexa and are hearing of self-drive cars.

There are some amazing AI powered applications that we have witnessed in the marketplace such as the mobile robots that go around the store aisles and collect inventory data automatically, or the warehouse robots that auto fetch products for shipment.

The latest entrant are the delivery robots as seen on the streets of Palo Alto or like the driverless delivery vehicles Kroger is testing out.

How does AI function?

AI is basically getting machines to see patterns in data. This process is called training or learning. This learning can be done in two ways:

Supervised learning

This is the process of training the machine by showing it large amounts of data of particular type. The machine looks at this data and learns its patterns. Once trained, the machine can consistently detect the same patterns in any new data. A simple example would be if we show a machine 100 images of a chair. Now when a machine is shown a new, unseen image of a chair, it would automatically map the patterns it has learnt, to say that it must be a chair.

Unsupervised learning

In this approach we allow the machine to automatically start finding patterns in the data and then based on these patterns pooling the data in to different buckets. This process is called clustering. The machine basically clusters the data based on patterns it sees in the data.

A good example for this would be if a machine is shown a mix of chairs and tables, the machine would create clusters of chairs and tables. Although, here it may not be able to indicate which cluster is a table or chair, but can group them separately.

The application of detecting patterns is where the most interesting part lies. We can use this ability to get machines to:

  1. Find errors or anomalies in the data like identifying an irregular transaction in a bank statement by scanning through the transaction or detecting wrong customer data.
  2. Label or classify data like training a machine to automatically classify product images into chairs, tables, etc or getting a machine to look at customer transaction data and creating customer personas. In our retail example we train models to look at pierce of furniture or clothing and automatically identify characteristics of the product like colours, shapes, patterns etc.
  3. Generating data This is the frontier of AI where machines can learn patterns and use these patterns to generate new data or content. The best example would be that of machines learning painting styles of masters like Van Gogh and Monet and repainting a picture in the same style. There are models that can learn writing styles an reproduce new product descriptions based on these styles.

While these examples are not large scale business applications, but they can be leveraged to solve real problems.

The beauty of AI is that it can consistently perform its tasks.

Our approach has been to identify problems that don’t have any alternate solution, and is also something that can be done using AI in days or weeks rather than months or years.

A good example is looking at large volume product images and automatically getting all the useful information from it as text attributes. This comes in handy when consumers search for products. Another application would be searching for products that look similar.

AI can be easily applied to accounts receivable and payable reconciliation, cleansing and augmenting customer data, creating personalized shopping experiences for consumers, automatically creating custom product bundles for every consumer, automating team schedules, automatically identifying best candidates from a pool of resumes. The list can go on and on.

The key takeaway is while there are game changing applications of AI like self-drive cars, there are far greater applications that can have immediate impact. Our belief is that the impact of AI will be far greater in solving day to day problems and improve people lives.

Convolutional neural nets

Everything you need to know about Convolution Neural Nets

Machine Learning has been around for a while now and we are all aware of its impact in solving everyday problems. Initially, it was about solving simple problems of statistics, but with the advancements in technology over time, it picked up pace to give bigger and better results. It has grown to solve bigger problems such as image recognition and now even possesses the ability to distinguish a cat from a dog.

In this article, we will briefly touch upon the nature and how to manipulate information represented through the network to solve some of the toughest problems around image recognition.

Prologue: a troublesome story of Real Estate Agents

Let’s start right at the beginning. Say we have input vectors — specifications of a house, and outputs like the price of the house. Not delving deeper into the details, visualize it as though we have information described as a set of concepts such as kitchen size, number of floors, location of the house and we need to represent information pertinent to another set of concepts such as the price of house, architecture quality, etc. This is basically conversion from one conceptual representation to another conceptual representation. Let’s now look at a human converting this –

He (say Alex) would probably have a mathematical way to convert this from one conceptual representation to another through some ‘if-else’ condition to start off.

If he (say Bob) was slightly smarter, he would have converted input concepts into some intermediary scores like simplicity, floor quality, noise in the neighbourhood, etc. He would also cleverly map these scores to the corresponding final output, say price of the house.

If you see what has changed from an ordinary real estate agent(Alex) to a slightly smarter real estate agent (Bob) is that he mapped input-output information flow in detail. In other words, he changed the framework in which he thought he could best represent the underlying architecture.

Lesson 1: The ‘Framework of thinking’ is everything

So the difference between Alex and Bob’s thought process was that Bob could figure out that secondary concepts are easy to calculate, and hence he combined them to represent the final desired output whereas Alex tried to apply an entire ‘if-else’ logic for each one of the input variables and mapped it with each one of the output variables. Bob in a way represented the same mapping in a more systematic way by breaking them into smaller concepts and just had to remember fewer concepts. Meanwhile, Alex had to remember how every input is connected to every output without breaking it into smaller concepts. So the big lesson here is that the ‘framework of thinking’ is everything.

This is what most researchers have realized. Every researcher has the same problem, let’s take for instance, the cat vs dog image.

Researchers have to convert information from one conceptual representation (pixels) to another conceptual representation (is-cat is True/False). They also have almost the same computational power(memory, complexity etc), hence the only way to solve this problem is to introduce the framework of thinking that decodes inputs with minimum resources and converts it from one form to another. You would’ve already heard about a lot of ‘frameworks of thinking’. When people say Convolutional Networks, it simply means — it is a framework of representing a particular mapping function. Most statistical models that predict house prices are also just mapping functions. They all try to best predict a universal mapping function from input to output

Lesson 2: Universal Mapping function like Convolutional Neural Networks

Convolutional Neural Networks or CNN are a form of functions that uses some concepts around images — like positional invariance. That means the network can re-use the same sub mapping function from the bottom part of the image to the top part of the image. This essentially reduces the number of parameters in which the Universal Mapping function can be represented. This is why CNNs are cone shaped. Here we move from concepts that are space oriented (pixels) to concepts that are space independent (cat-or-not, has-face). That’s it. It’s that simple. Information is smartly converted from one form to another.

Lesson 3: Convolutional Neural Networks and the Brain

Recent advancements in Neuroscience has essentially said the same thing regarding how we decode the information in the visual cortex. We first decode lines, then decode objects like boxes, circles, curves etc, then decode them into faces, headphones etc.

Conclusion

A lot of Machine Learning/Deep Learning/AI technologies have very simple conceptual frameworks. The reason behind it solving gargantuan problems lies in the complexity that arises from a whole lot of simple-conceptual-frameworks that are attached end-to-end. It is so complex that we can’t really predict whether these networks can solve any kind of problem. Yet, we have been implementing them on a day to day basis based on some sort of assumption. It’s very similar to the human brain. We know its underlying structure and framework. We discovered it half a century ago. Yet, we’ve not been able to decipher this complex world and we are still unsure as to when we’ll reach such an understanding.

canyon with the sky full of stars

Deep Belief Networks — all you need to know

Deep Belief Networks - all you need to know

Deep Belief Networks are a graphical representation which are essentially generative in nature i.e. it produces all possible values which can be generated for the case at hand. #machinelearning @icecreamlabs

Click to Tweet

With the advancement of machine learning and the advent of deep learning, several tools and graphical representations were introduced to co relate the huge chunks of data.

Deep Belief Networks are a graphical representation which are essentially generative in nature i.e. it produces all possible values which can be generated for the case at hand. It is an amalgamation of probability and statistics with machine learning and neural networks. Deep Belief Networks consist of multiple layers with values, wherein there is a relation between the layers but not the values. The main aim is to help the system classify the data into different categories.

How did Deep Belief Neural Networks Evolve?

The First Generation Neural Networks used Perceptrons which identified a particular object or anything else by taking into consideration “weight” or pre-fed properties. However the Perceptrons could only be effective at a basic level and not useful for advanced technology. To solve these issues, the Second Generation of Neural Networks saw the introduction of the concept of Back propagation in which the received output is compared with the desired output and the error value is reduced to zero. Support Vector Machines created and understood more test cases by referring to previously input test cases. Next came directed a cyclic graphs called belief networks which helped in solving problems related to inference and learning problems. This was followed by Deep Belief Networks which helped to create unbiased values to be stored in leaf nodes.

Restricted Boltzmann Machines

Deep Belief Networks are composed of unsupervised networks like RBMs. In this the invisible layer of each sub-network is the visible layer of the next. The hidden or invisible layers are not connected to each other and are conditionally independent. The probability of a joint configuration network over both visible and hidden layers depends on the joint configuration network’s energy compared with the energy of all other joint configuration networks.

Training a Deep Belief Network

The first step is to train a layer of properties which can obtain the input signals from the pixels directly. The next step is to treat the values of this layer as pixels and learn the features of the previously obtained features in a second hidden layer. Every time another layer of properties or features is added to the belief network, there will be an improvement in the lower bound on the log probability of the training data set.

For instance:

 

Implementation

MATLAB can easily represent visible layer, hidden layers and weights as matrices and execute algorithms efficiently. Hence, we choose MATLAB to implement DBN. These handwritten digits of MNIST9 are then used to perform calculations in order to compare the performance against other classifiers. The MNIST9 can be described as a database of handwritten digits. There are 60,000 training examples and 10,000 testing examples of digits. The handwritten digits are from 0 to 9 and are available in various shapes and positions for each and every image. Each of them is normalized and centered in 28×28 pixels and are labeled. The methods to decide how often these weights are updated are — mini batch, online and full-batch. Online learning takes the longest computation time because its updates weights after each training data instance. Full-batch goes through the training data and updates weights, however, it is not advisable to use it for big datasets. Mini-batch divides a dataset into smaller bits of data and performs the learning operation for every chunk. This method takes less computation time. Hence, we use mini-batch learning for implementation.

An important thing to keep in mind is that implementing a Deep Belief Network demands training each layer of RBM. For this purpose, the units and parameters are first initialized. It is followed by two phases in Contrastive Divergence algorithm — positive and negative. In the positive phase, the binary states of the hidden layers can be obtained by calculating the probabilities of weights and visible units. Since it is increases the probability of the training data set, it is called positive phase. The negative phase decreases the probability of samples generated by the model. The greedy learning algorithm is used to train the entire Deep Belief Network.

The greedy learning algorithm trains one RBM at a time and until all the RBMs have been taught.