Partial Dependence Plots: Analysis Configuration and Output
Partial dependence plots (PDP) show the dependence of the predicted target response on a set of input features of interest. These are marginalized over the values of all other input features and are referred to as the complement features. Intuitively, you can interpret the partial dependence as the target response, which is expected as a function of each input feature of interest.
Partial dependence plots analysis configuration
To create a partial dependence plot (PDP), Amazon SageMaker Clarify initially looks for the
feature columns specified in a JSON array of the analysis_config.json
.
The other parameters that configure the analysis of a processing job must be
provided in this JSON file. For more information about configuring PDPs and other
aspects of an analysis, see Configure the
Analysis.
The following code contains an example of a JSON "pdp"
object in the
"methods"
object of an analysis_config.json
.
configuration file.
{ "dataset_type":... "baseline": [[..]] . . "methods": { "shap" : { "baseline": ".." "num_samples": 100 }, "pdp": { "features": ["Age", "MaturityMonths"] // The features for which we need to plot PDP. "grid_resolution": 20, //Required for numerical columns only. //The number of buckets into which the range of values is divided. "top_k_features": 10, //Specifies how many of the top features must be used for PDP plots. The default is 10. }, . . } . . }
Note
If "features"
is not mentioned in the "pdp"
object
but "shap"
config is provided, SageMaker Clarify takes top ten features from
the global SHAP results to plot the PDP visualizations.
Partial dependence plots analysis output
The following code shows an example of the partial dependence plot (PDP) schema
returned in the analysis.json result file. The "pdp"
section in this
analysis output file contains the information required to generate the PDP plots.
Each dictionary in the list contains the specification for the PDP of the feature
specified by the feature_name
.
The data_type
indicates whether the data is numerical
or
categorical
. The feature_values
field contains the
values present in the feature. If the data_type
inferred by Clarify
is categorical
, feature_values
contain all the unique
values that the feature could assume. If the data_type
inferred by
Clarify is numerical
, it contains a list of the central values of
each of the grid_resolution
number of buckets generated by Clarify.
If the partial dependence plots are computed for a particular feature, the
feature_values
, model_predictions
, and
data_distributions
fields are replaced by the error
field which contains an error message.
{ "version": "1.0", "explanations":{ "kernel_shap":{ . . . }, "pdp": [ { "feature_name": "Age", "data_type": "numerical" "feature_values": [ 20.4, 23.2, 26.0, 28.799999999999997, 31.599999999999998, 34.4, 70.8, 73.6 ], "model_predictions": [ [ 0.6830344458296895, 0.6812452118471265, 0.6908621763065458, 0.7008252082392573, 0.733054383918643, 0.7352442337572574, 0.7337257475033403, 0.7395857129991055, ], ], "data_distribution": [ 0.13, 0.25, 0.15, 0.35. 0.17 ] }, { "feature_name": "text_column", "data_type": "free_text" "error": "Detected data type is not supported for PDP. PDP can only be computed for numerical or categorical columns" } ] } }
This PDP schema generates the following partial dependence plot for the Age
feature. The PDP plots the feature_values
along the x-axis. The y-axis
contains the values in model_predictions
field. Each list in the
model_predictions
field corresponds to one class in the output from
the model.

You can view the plot in the report.pdf file in the analysis output path that you provided.