# Feature Attributions that Use Shapley Values

SageMaker Clarify provides feature attributions based on the concept of Shapley value

SageMaker Clarify has taken the concept of Shapley values from game theory and deployed
it in a
machine learning context. The Shapley value provides a way to quantify the contribution
of each player to a game, and hence the means to distribute the total gain generated
by
a game to its players based on their contributions. In this machine learning context,
SageMaker Clarify treats the prediction of the model on a given instance as the
*game* and the features included in the model as the
*players*. For a first approximation, you might be tempted to
determine the marginal contribution or effect of each feature by quantifying the result
of either *dropping* that feature from the model or
*dropping* all other features from the model. However, this
approach does not take into account that features included in a model are often not
independent from each other. For example, if two features are highly correlated,
dropping either one of the features might not alter the model prediction significantly.

To address these potential dependencies, the Shapley value requires that the outcome
of each possible combination (or coalition) of features must be considered to determine
the importance of each feature. Given *d* features, there are
2^{d} such possible feature combinations, each corresponding
to a potential model. To determine the attribution for a given feature
*f*, consider the marginal contribution of including
*f* in all feature combinations (and associated models) that do
not contain *f*, and take the average. It can be shown that Shapley
value is the unique way of assigning the contribution or importance of each feature
that
satisfies certain desirable properties. In particular, the sum of Shapley values of
each
feature corresponds to the difference between the predictions of the model and a dummy
model with no features. However, even for reasonable values of *d*,
say 50 features, it is computationally prohibitive and impractical to train
2^{d} possible models. As a result, SageMaker Clarify needs to make use
of various approximation techniques. For this purpose, SageMaker Clarify uses SHapley
Additive
exPlanations (SHAP), which incorporates such approximations and devised a scalable
and
efficient implementation of the Kernel SHAP algorithm through additional optimizations.

For additional information on Shapley values, see
A Unified Approach to Interpreting Model Predictions