Visualizing the Impact of Feature Attribution Baselines

August 29, 2022

Tags

Explaining the deep models has become easy especially with the introduction of path attribution methods, which are gradient-based. And, for this purpose, a required hyperparameter called the baseline input is preferred. What is the meaning of baseline input and what is its significance? We will be discussing this in detail. As a case study, we will use image classification networks and discuss different ways of choosing a baseline input. The hypotheses are implied in each baseline. The discussion of baselines has a close connection with the missingness concept in the feature space for interpretability research.

While you are training neural networks, the integrated gradients method has been used. It helps in computing features for making a specific prediction on a particular data point. It is used for computing various data types like images of retinal fundus images or ECG recordings.

This makes a data scientist wonder if these integrated gradients are sensitive to the choice of hyperparameter. Is the constant black image a “natural baseline” for image data? Is there any choice of alternate parameters? For this, we shall explore several ideas of missingness.

Image classification for plotting integrated gradients attributions

We use images for visually plotting integrated gradient attributions. These are then compared with our intuition. Inception V4 architecture is then used for assessing which pixels are important. ImageNet dataset helps to determine which class an image belongs to. Then we download weights from TensorFlow-Slim for testing the accuracy of the various images of the validation set. These attributions can be verified to the 99th percentile for avoiding high-scale attributions that dominate the colour scheme.

A Better Understanding of Integrated Gradients

Assessing the attribution maps might yield an unintuitive. Explore the method of generation of the feature attributions. Integrated gradients define the importance value and local gradients are thus accumulated on the interpolated images that range between the current input and baseline value.

Unfortunately, many problems arose with using gradients for interpreting Deep Neural Networks (DNN). One specific issue is that of saturation. The gradients of these input features have small magnitudes around a sample. It occurs when the network function flattens after reaching a certain magnitude.

Game Theory and Missingness

Integrated gradients have been derived from cooperative game theory, around the Aumann-Shapley value in particular. In cooperative game theory, a non-atomic game is a construction model which is used to design large-scale economic systems using continuous modelling with the help of Aumann-Shapley values. It is a stable way to determine how many different groups of participants can make system contributions. In this game theory, the notion of missingness is well-defined.

Are there any Alternatives of the Baseline Choices

Any constant colour baseline will face this problem of interpolation. So are there any possible substitutes? There are four of them. The four substitutes of feature attribution are given below -

The Maximum Distance Baseline

Constant baselines are unable to distinguish the baseline colour, or they are blind baselines. The solution for this is to construct a baseline by taking the farthest image in the L1 distance in the valid pixel range. It is known as the maximum distance baseline.

The Blurred Baseline

The maximum distance baseline suffers from an issue - it doesn’t represent missingness but it contains a lot of data from the original image. So we make the prediction relative to the original information of the baseline which is presented as a blurred version of the image.

The Uniform Baseline

A blurred baseline is biassed for highlighting high-frequency data. To avoid this issue, we can use the original information or the data from the above two baseline attribution methods and make a uniform baseline. There is another way to define missingness in this case. It can be done by sampling a random uniform image in the valid pixel range and referring to that as the baseline.

The Gaussian Baseline

The uniform distribution is not the only baseline method of drawing random noise. You can also use a Gaussian distribution which is calculated on the current image with a variance, σ. σ or ∑ or Sigma can be used for computing the blur and gaussian baselines. It also helps to smoothen the standard deviation and the kernel of the noise respectively.

Averaging over multiple baselines and the dangers of qualitative assessment

Some of these methods could create data blindness. To overcome these, it is better to average over multiple different baselines. All you need to do is simply draw more samples using the same distribution methods on all the baselines and compute the average of the crucial scores from each sample.

Qualitative assessment is dangerous because the knowledge of the relationship between the data and the labels is human and hence, naive and we require an accurate model for computing data.

In this case, we simply evaluate which methods best explain our network without going into the in-depth actions of the network.

Conclusion

So what can be done in this case? We have multiple baselines with no conclusive results. Using the quantitative results, we compare all of the following mentioned feature attribution baselines, we can afford to have a foundation for understanding these distribution systems. Each baseline is an assumption derived out of missingness in the models and the distribution of the data. We need to define this quantity because most ML models cannot handle random patterns of missing inputs.

Reference Links

https://distill.pub/2020/attribution-baselines/

https://www.researchgate.net/publication/338652436_Visualizing_the_Impact_of_Feature_Attribution_Baselines

https://github.com/distillpub/post--attribution-baselines

Sign up for Free Trial

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure

Visualizing the Impact of Feature Attribution Baselines

Example H2

Image classification for plotting integrated gradients attributions

A Better Understanding of Integrated Gradients

Game Theory and Missingness

Are there any Alternatives of the Baseline Choices

Any constant colour baseline will face this problem of interpolation. So are there any possible substitutes? There are four of them. The four substitutes of feature attribution are given below -

The Maximum Distance Baseline

The Blurred Baseline

The Uniform Baseline

The Gaussian Baseline

Averaging over multiple baselines and the dangers of qualitative assessment

Qualitative assessment is dangerous because the knowledge of the relationship between the data and the labels is human and hence, naive and we require an accurate model for computing data.

In this case, we simply evaluate which methods best explain our network without going into the in-depth actions of the network.

Conclusion

Reference Links

https://distill.pub/2020/attribution-baselines/

https://www.researchgate.net/publication/338652436_Visualizing_the_Impact_of_Feature_Attribution_Baselines

https://github.com/distillpub/post--attribution-baselines

Latest Blogs

Visualizing the Impact of Feature Attribution Baselines

Table of Contents

Visualizing the Impact of Feature Attribution Baselines

Table of Contents

How Does RAG Improve the Accuracy of LLM Responses?

Top 10 Cloud GPU Providers in 2025

What is Retrieval-Augmented Generation (RAG)?

AI Inference vs Training: Understanding Key Differences

Sovereign Cloud: India's Key to Digital Independence in the AI Age

E2E Sovereign Cloud Platform: Revolutionizing Cloud Sovereignty

Top 8 Generative AI Applications in 2025

A Comparison between TIR Containerized VMs vs Traditional VMs

Accelerate Your AI Application Development Using TIR Containerized VMs

The AI Revolution in the Automotive Industry: Steering Toward a Smarter, Safer, and Sustainable Future