SageMaker Debugger XGBoost Training Report - Amazon SageMaker

SageMaker Debugger XGBoost Training Report

For SageMaker XGBoost training jobs, use the Debugger CreateXgboostReport rule to receive a comprehensive training report of the training progress and results. Following this guide, specify the CreateXgboostReport rule while constructing an XGBoost estimator, download the report using the Amazon SageMaker Python SDK or the Amazon S3 console, and then you can interpret the profiling results.


To use the new Debugger features, you need to upgrade the SageMaker Python SDK and the SMDebug client library. In your iPython kernel, Jupyter notebook, or JupyterLab environment, run the following code to install the latest versions of the libraries and restart the kernel.

import sys import IPython !{sys.executable} -m pip install -U sagemaker smdebug IPython.Application.instance().kernel.do_shutdown(True)

Construct a SageMaker XGBoost Estimator with the Debugger XGBoost Report Rule

When you construct a SageMaker estimator for an XGBoost training job, specify the rule as shown in the following example code.

The CreateXgboostReport rule collects the following output tensors from your training job:

  • hyperparameters – Saves at the first step.

  • metrics – Saves loss and accuracy every 5 steps.

  • feature_importance – Saves every 5 steps.

  • predictions – Saves every 5 steps.

  • labels – Saves every 5 steps.

The output tensors are saved at a default S3 bucket. For example, s3://sagemaker-<region>-<12digit_account_id>/<base-job-name>/debug-output/.

Using the SageMaker generic estimator
import boto3 import sagemaker from sagemaker.estimator import Estimator from sagemaker import image_uris from sagemaker.debugger import Rule, rule_configs rules=[ Rule.sagemaker(rule_configs.create_xgboost_report()) ] region = boto3.Session().region_name xgboost_container=sagemaker.image_uris.retrieve("xgboost", region, "1.2-1") estimator=Estimator( role=sagemaker.get_execution_role() image_uri=xgboost_container, base_job_name="debugger-xgboost-report-demo", instance_count=1, instance_type="ml.m5.2xlarge", # Add the Debugger XGBoost report rule rules=rules )

Download the Debugger XGBoost Training Report

Download the Debugger XGBoost training report while your training job is running or after the job has finished using the Amazon SageMaker Python SDK and AWS Command Line Interface (CLI).

Download using the SageMaker Python SDK and AWS CLI
  1. Check the current job's default S3 output base URI.

  2. Check the current job name.

  3. The Debugger XGBoost report is stored under <default-s3-output-base-uri>/<training-job-name>/rule-output. Configure the rule output path as follows:

    rule_output_path = estimator.output_path + estimator.latest_training_job.job_name + "/rule-output"
  4. To check if the report is generated, list directories and files recursively under the rule_output_path using aws s3 ls with the --recursive option.

    ! aws s3 ls {rule_output_path} --recursive

    This should return a complete list of files under autogenerated folders that are named CreateXgboostReport and ProfilerReport-1234567890. The XGBoost training report is stored in the CreateXgboostReport, and the profiling report is stored in the ProfilerReport-1234567890 folder. To learn more about the profiling report generated by default with the XGBoost training job, see SageMaker Debugger Profiling Report.

                                        An example of rule output.

    The xgboost_report.html is an autogenerated XGBoost training report by Debugger. The xgboost_report.ipynb is a Jupyter notebook that's used to aggregate training results into the report. You can download all of the files, browse the HTML report file, and modify the report using the notebook.

  5. Download the files recursively using aws s3 cp. The following command saves all of the rule output files to the ProfilerReport-1234567890 folder under the current working directory.

    ! aws s3 cp {rule_output_path} ./ --recursive

    If you are using a Jupyter notebook server, run !pwd to verify the current working directory.

  6. Under the /CreateXgboostReport directory, open xgboost_report.html. If you are using JupyterLab, choose Trust HTML to see the autogenerated Debugger training report.

                                        An example of rule output.
  7. Open the xgboost_report.ipynb file to explore how the report is generated. You can customize and extend the training report using the Jupyter notebook file.

Download using the Amazon S3 console
  1. Sign in to the AWS Management Console and open the Amazon S3 console at

  2. Search for the base S3 bucket. For example, if you haven't specified any base job name, the base S3 bucket name should be in the following format: sagemaker-<region>-111122223333. Look up the base S3 bucket through the Find bucket by name field.

                                        The Find bucket by name field in the Amazon S3
  3. In the base S3 bucket, look up the training job name by entering your job name prefix in Find objects by prefix and then choosing the training job name.

                                        The Find objects by prefix field in the Amazon S3
  4. In the training job's S3 bucket, choose rule-output/ subfolder. There must be three subfolders for training data collected by Debugger: debug-output/, profiler-output/, and rule-output/.

                                        An example to the rule output S3 bucket
  5. In the rule-output/ folder, choose the CreateXgboostReport/ folder. The folder contains xbgoost_report.html (the autogenerated report in html) and xbgoost_report.ipynb (a Jupyter notebook with scripts that are used for generating the report).

  6. Choose the xbgoost_report.html file, choose Download actions, and then choose Download.

                                        An example to the rule output S3 bucket
  7. Open the downloaded xbgoost_report.html file in a web browser.

Debugger XGBoost Training Report Walkthrough

This section walks you through the Debugger XGBoost training report. The report is automatically aggregated depending on the output tensor regex, recognizing what type of your training job is among binary classification, multiclass classification, and regression.

Distribution of True Labels of the Dataset

This histogram shows the distribution of labeled classes (for classification) or values (for regression) in your original dataset. Skewness in your dataset could contribute to inaccuracies. This visualization is available for the following model types: binary classification, multiclassification, and regression.

                        An example of a distribution of true labels of the dataset

Loss versus Step Graph

This is a line chart that shows the progression of loss on training data and validation data throughout training steps. The loss is what you defined in your objective function, such as mean squared error. You can gauge whether the model is overfit or underfit from this plot. This section also provides insights that you can use to determine how to resolve the overfit and underfit problems. This visualization is available for the following model types: binary classification, multiclassification, and regression.

                        An example of a loss versus step graph.

Feature Importance

There are three different types of feature importance visualizations provided: Weight, Gain and Coverage. We provide detailed definitions for each of the three in the report. Feature importance visualizations help you learn what features in your training dataset contributed to the predictions. Feature importance visualizations are available for the following model types: binary classification, multiclassification, and regression.

                        An example of a feature importance graph.

Confusion Matrix

This visualization is only applicable to binary and multiclass classification models. Accuracy alone might not be sufficient for evaluating the model performance. For some use cases, such as healthcare and fraud detection, it’s also important to know the false positive rate and false negative rate. A confusion matrix gives you the additional dimensions for evaluating your model performance.

                        An example of confusion matrix.

Evaluation of the Confusion Matrix

This section provides you with more insights on the micro, macro, and weighted metrics on precision, recall, and F1-score for your model.

                        Evaluation of the confusion matrix.

Accuracy Rate of Each Diagonal Element Over Iteration

This visualization is only applicable to binary classification and multiclass classification models. This is a line chart that plots the diagonal values in the confusion matrix throughout the training steps for each class. This plot shows you how the accuracy of each class progresses throughout the training steps. You can identify the under-performing classes from this plot.

                        An example of an accuracy rate of each diagonal element over
                            iteration graph.

Receiver Operating Characteristic Curve

This visualization is only applicable to binary classification models. The Receiver Operating Characteristic curve is commonly used to evaluate binary classification model performance. The y-axis of the curve is True Positive Rate (TPF) and x-axis is false positive rate (FPR). The plot also displays the value for the area under the curve (AUC). The higher the AUC value, the more predictive your classifier. You can also use the ROC curve to understand the trade-off between TPR and FPR and identify the optimum classification threshold for your use case. The classification threshold can be adjusted to tune the behavior of the model to reduce more of one or another type of error (FP/FN).

                        An example a receiver operating characteristic curve graph.

Distribution of Residuals at the Last Saved Step

This visualization is a column chart that shows the residual distributions in the last step Debugger captures. In this visualization, you can check whether the residual distribution is close to normal distribution that’s centered at zero. If the residuals are skewed, your features may not be sufficient for predicting the labels.

                        An example of a distribution of residuals at the last saved step

Absolute Validation Error per Label Bin Over Iteration

This visualization is only applicable to regression models. The actual target values are split into 10 intervals. This visualization shows how validation errors progress for each interval throughout the training steps in line plots. Absolute validation error is the absolute value of difference between prediction and actual during validation. You can identify the underperforming intervals from this visualization.

                        An example an absolute validation error per label bin over iteration