Local interpretability - AWS Prescriptive Guidance

Local interpretability

The most popular methods for local interpretability of complex models are based on either Shapley Additive Explanations (SHAP) [8] or integrated gradients [11]. Each method has a number of variants that are specific to a model type.

For tree ensemble models, use tree SHAP

In the case of tree-based models, dynamic programming allows for fast and exact computation of the Shapley values for each feature, and this is the recommended approach for local interpretations in tree ensemble models. (See 7], implementation is at https://github.com/slundberg/shap.)

For neural networks and differentiable models, use integrated gradients and conductance

Integrated gradients provide a straightforward way to compute feature attributions in neural networks. Conductance builds on integrated gradients to help you interpret attributions from portions of neural networks such as layers and individual neurons. (See [3,11], implementation is at https://captum.ai/.) You cannot use these methods on models without using a gradient; in such cases, you can use Kernel SHAP (discussed in the next section) instead. When the gradient is available, integrated gradient attributions can be computed more quickly than attributions from Kernel SHAP. A challenge to using integrated gradients is choosing the best base point for deriving an interpretation. For example, if the base point for an image model is the image of zero intensity in all the pixels, important regions of an image that are darker might not have attributions that align with human intuition. One approach to address this problem is to use multiple base point attributions and add them together. This is part of the approach taken in the XRAI feature attribution method for images [5], where the integrated gradient attributions that use a black reference image and a white reference image are added together to produce more consistent attributions.

For all other cases, use Kernel SHAP

You can use Kernel SHAP to compute feature attributions for any model, but it is an approximation to computing the full Shapley values and remains computationally expensive (see [8]). The computational resources required for Kernel SHAP grow quickly with the number of features. This requires approximation methods that can reduce the fidelity, repeatability, and robustness of explanations. Amazon SageMaker Clarify provides convenience methods that deploy prebuilt containers for computing Kernal SHAP values in separate instances. (For an example, see the GitHub repository Fairness and Explainability with SageMaker Clarify.)

For single tree models, the split variables and leaf values provide an immediately explainable model, and the methods discussed previously do not provide additional insight. Similarly, for linear models, the coefficients provide a clear explanation of model behavior. (SHAP and integrated gradient methods both return contributions that are determined by the coefficients.)

Both SHAP and integrated gradient-based methods have weaknesses. SHAP requires attributions to be derived from a weighted average of all feature combinations. Attributions obtained in this way can be misleading when estimating feature importance if there is a strong interaction between features. Methods that are based on integrated gradients can be difficult to interpret because of the large number of dimensions that are present in large neural networks, and these methods are sensitive to the choice of a base point. More generally, models can use features in unexpected ways to achieve a certain level of performance and these can vary with the model—feature importance is always model dependent.

Recommended visualizations

The following chart presents several recommended ways to visualize the local interpretations that were discussed in the previous sections. For tabular data we advise a simple bar graph that shows the attributions, so they can be easily compared and used to infer how the model is making predictions.

Visualizing local interpretations by using a bar graph

For text data, embedding tokens leads to a large number of scalar inputs. The methods recommended in the previous sections produce an attribution for each dimension of the embedding and for each output. In order to distill this information into a visualization, the attributions for a given token can be summed. The following example shows the sum of the attributions for the BERT-based question answering model that was trained on the SQUAD dataset. In this case, the predicted and true label is the token for the word “france.”

Sum of attributions for a BERT-based question answering model that was trained on the SQUAD dataset, example 1

Otherwise, the vector norm of the token attributions can be assigned as a total attribution value, as shown in the following example.

Sum of attributions for a BERT-based question answering model that was trained on the SQUAD dataset, example 2

For intermediate layers in deep learning models, similar aggregations can be applied to conductances for visualization, as shown in the following example. This vector norm of the token conductance for transformer layers shows the eventual activation for the end token prediction (“france”).

For intermediate layers in deep learning models, how aggregations can be applied to conductances for visualization

Concept activation vectors provide a method for studying deep neural networks in more detail [6]. This method extracts features from a layer in an already trained network and trains a linear classifier on those features to make inferences about the information in the layer. For example, you might want to determine which layer of a BERT-based language model contains the most information about the parts of speech. In this case, you could train a linear part-of-speech model on each layer output and make a rough estimate that the best performing classifier is associated with the layer that has the most part-of-speech information. Although we do not recommend this as a primary method for interpreting neural networks, it can be an option for more detailed study and aid in the design of network architecture.