In this article, we discuss the epistemological nature of deep learning models and how to tackle their uncertainties.
Introduction
While algorithm bias in deep learning models may not sound serious at first, it is indeed a significant concern. In sectors like medicine, national safety, business automation, and climatology, recognizing that deep learning is not a one-size-fits-all solution for predicting everything. For instance, companies that rely on data-driven decisions are vulnerable to challenges caused by inaccurate predictions. The same goes for other sectors as well. Deep learning models are trained on a sample dataset that is representative of a broader population. However, the randomness of real-world data can be a challenge to these models. . This is exactly what the epistemic uncertainty of the deep learning model means.
In simple terms, this kind of uncertainty arises from the deficiency in the variety of historical data that the model is trained on. For instance, we cannot expect a deep learning model to recognize a picture of a bird when it is trained on a neural network that identifies cats and dogs. The situation gets worse as our problem gets complex. In this article, we will discuss how the epistemic uncertainty in deep learning arises and the methods that can be used to solve it.
Breaking Down the Concept of Epistemic Uncertainty in Deep Learning Models
Understanding the epistemic uncertainty of a deep learning model is important in risk-based decision-making and safety-critical applications. It is important because we need to know how reliable the model is. It arises mainly due to two major reasons:
- Inadequate amount of training data or noisy training data
Consider the following dataset of handwritten digits, the famous MNIST dataset:
While humans are blessed with the visual cortex to differentiate between digits, it becomes a mundane task to train a neural network because it has to learn from each pixel and determine which curves and edges form a digit. Also, the above training data is not enough for the model to perceive a digit when a number is provided as input by a person who has completely different handwriting altogether. Or in statistical terms, we can say that the probability of identifying some random person’s handwriting is very less if it doesn’t pertain to the sample set provided above.
Also, there is a lot of noise in this dataset. For example, take this image:
Is it a scar? Is it a point? Or is it even a number? According to the MNIST dataset, it's the digit “8”. Such noisy data values further contribute to the epistemic uncertainty of deep learning models.
- Noise arises while training the deep learning model
While training a deep neural network there are a lot of weights and biases involved which makes the whole network noisy and produces vague outputs.
The above neural network just has two hidden layers with probably millions of calculations. Noise is bound to occur while training more complex data.
Methods Employed to Tackle the Uncertainty
- Monte-Carlo Dropout
- Deep Ensembles
- Quantile Regression
- Monte-Carlo Dropout
To understand Monte-Carlo Dropout we need to first break down the terms. Dropout in deep learning means some of the neurons in the deep neural networks are randomly dropped to avoid overfitting during the training process. Monte-Carlo Dropout takes this process further by dropping random neurons even during the testing process. Let us better understand this with the analogy of the roulette game.
The player chooses some random set of numbers and bets on it. The bets are the training data points. After spinning the wheel, the ball is dropped onto it. This process is done multiple times, but by skipping random numbers. The same is performed during the testing process to generate the winning outcomes when the ball eventually comes to rest. In each test run, there is a slightly different outcome due to the dropout of numbers. Thus, the different numbers generated during the spin help us to understand how uncertain or confident the game can be.
- Deep Ensembles
Deep ensembles make use of multiple deep neural networks to improve their performance, hence reducing the uncertainty and increasing the accuracy of the model. Let’s take the example of Beth Harmon, the fictional American chess prodigy from the popular series, The Queen’s Gambit.
Beth learned to play chess from her janitor in the orphanage where she lived. Mr. Shaibel trained her well and they used to play regularly in the basement. This is analogous to a single deep learning model that is trained on some data. Later on in the show, Beth is introduced to various tournaments where she meets players and learns from their moves as well. The show ends with a grand ending where she beats the world champion after a very long series of tournaments. She learns from her own opponents and is beautifully able to use it against them. The other players are analogous to individual deep learning models that help enhance the performance of Beth.
- Quantile Regression:
Quantile regression is a technique used to remove epistemic uncertainty during training by using the loss function. Setting house prices is one of the most common examples used to explain deep learning to beginners.
The sale price of a house is a very crucial factor in the housing industry. Thus, it is the target variable. The target variable depends upon various factors which are known as the predictor variable. The predictor variables are square foot area, number of bedrooms, location, and so on. These predictor variables are used to determine the various quantiles of the house price.
Quantiles divide the house price into different percentiles. Quantile regression captures how changes in predictor variables affect the house price at various percentiles. For instance, the location of the house may have a significant effect on the upper percentiles as compared to the lower percentiles. Hence, such models are robust and correctly predict the prices of outliers.
Conclusion
In this article we saw how the epistemic uncertainty of deep learning arises and can be dealt with. The various techniques to predict the uncertainties make the models faster and more reliable in critical applications. The examples depicted in the article were real-life connections to how deep neural networks mimic them. The examples were illustrated so that the readers get a better understanding of the concepts used to tackle the uncertainty in deep learning models.
E2E Cloud's competitive pricing and exceptional performance have positioned it as an ideal choice for data scientists, AI researchers, and technical professionals seeking an enhanced GPU Jupyter Notebook experience. By offering cost-effective GPU instances and ensuring top-notch performance, E2E Cloud enables users to leverage the power of NVIDIA GPUs without compromising on their budget.