MLPER-08: Establish feature statistics - Machine Learning Lens

MLPER-08: Establish feature statistics

Establish key statistics to measure changes in the data that affect model outcomes. The effect of changes in data on model inference depends on the sensitivity of the model to data features. Analyze the feature importance and sensitivity of the model to select the features to monitor. Monitor the statistics of features that have the largest influence on inferences. Place acceptability limits on the range of data to alert when important features drift outside the statistical range of the training data. Significant drifts in important features would suggest model re-training.

Implementation plan

  • Analyze and evaluate data - Use Amazon SageMaker Data Wrangler to analyze the distribution of the selected features. After training the model, map out the regions in feature space where the predictions change abruptly and where the predictions are invariant. Establish a baseline for monitoring the data with Amazon SageMaker Model Monitor. Perform a sensitivity analysis of changes in the feature values near the decision boundaries of the model. Analyze the feature importance to understand how new data will affect the model’s predictions. Amazon SageMakerExperiments will help to organize model testing. Use Amazon SageMaker Clarify to check for data biases and imbalances. Monitor the statistics of data used in production inferences. Consider retraining the model if the features are outside the original distribution of the training data.

Documents

Blogs

Videos

Examples