Model monitoring

After an ML model has been deployed to a production environment, it is important to monitor the model based on:

Infrastructure — To ensure that the model has adequate compute resources to support inference workloads
Performance — To ensure that the model predictions do not degrade over time

Monitoring model performance is more challenging, because the underlying patterns in the dataset are constantly evolving, which causes a static model to underperform over time.

In addition, obtaining ground truth labels for data in a production environment is expensive and time consuming. An alternative approach is to monitor the change in data and model entities with respect to a baseline. Amazon SageMaker AI Model Monitor can help to nearly continuously monitor the quality of ML models in production, which may play a role in postmarket vigilance by manufacturers of Software as a Medical Device (SaMD).

SageMaker Model Monitor provides the ability to monitor drift in data quality, model quality, model bias, and feature attribution. A drift in data quality arises when the statistical distribution of data in production drifts away from the distribution of data during model training. This primarily occurs when there is a bias in selecting the training dataset; for example, where the sample of data that the model is trained on has a different distribution than that during model inference, or in non-stationary environments when the data distribution varies over time.

A drift in model quality arises when there is a significant deviation between the predictions that the model makes and the actual ground truth labels. SageMaker Model Monitor provides the ability to create a baseline to analyze the input entities, define metrics to track drift, and nearly continuously monitor both the data and model in production based on these metrics. Additionally, Model Monitor is integrated with SageMaker Clarify to identify bias in ML models.

Model deployment and monitoring for drift

For model monitoring, perform the following steps:

After the model has been deployed to a SageMaker endpoint, enable the endpoint to capture data from incoming requests to a trained ML model and the resulting model predictions.
Create a baseline from the dataset that was used to train the model. The baseline computes metrics and suggests constraints for these metrics. Real-time predictions from your model are compared to the constraints, and are reported as violations if they are outside the constrained values.
Create a monitoring schedule specifying what data to collect, how often to collect it, how to analyze it, and which reports to produce.
Inspect the reports, which compare the latest data with the baseline, and watch for any violations reported and for metrics and notifications from Amazon CloudWatch.

The drift in data or model performance can occur due to a variety of reasons, and it is essential for the technical, product, and business stakeholders to diagnose the root cause that led to the drift. Early and proactive detection of drift enables you to take corrective actions such as model retraining, auditing upstream data preparation workflows, and resolving any data quality issues.

If all else remains the same, then the decision to retrain the model is based on considerations such as:

Reevaluate target performance metrics based on the use-case
A tradeoff between the improvement in model performance vs. the time and cost to retrain the model
The availability of ground truth labeled data to support the desired retraining frequency

After the model is retrained, you can evaluate the candidate model performance based on a champion/challenger setup, or with A/B testing, prior to redeployment.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Model interpretability

Operationalize AI/ML workloads