Clarify availability change
AWS-Published Monitoring Solutions + Standardized Bias Metrics + SHAP + Amazon Bedrock Evaluations
Note
After careful consideration, we have made the decision to close new customer access to Amazon Sagemaker Clarify, effective 6/30/26. Existing customers can continue to use the service as normal. AWS continues to invest in security and availability improvements for Clarify, but we do not plan to introduce new features. For more information, see Clarify availability change.
The combination of the AWS-published open-source SageMaker AI monitoring solutions, base computation of the standardized bias metrics Clarify reports, the SHAP library that SageMaker Clarify itself is built on, Amazon SageMaker AI MLflow, Amazon CloudWatch, Amazon QuickSight, and Amazon Bedrock Evaluations serves as a replacement for Amazon SageMaker Clarify. This guidance provides the following:
-
AWS-published reference solutions are the foundation. The open-source Amazon SageMaker AI monitoring solutions
in the aws-samplesGitHub organization are production-tested reference architectures built entirely on AWS managed services (Amazon SageMaker AI, SageMaker AI MLflow, Amazon Athena, AWS Lambda, Amazon EventBridge, Amazon SQS, Amazon SNS, Amazon QuickSight). They run inside your own AWS account and operationalize bias and explainability analysis as part of an end-to-end, governed pipeline rather than as ad hoc scripts. These are the same reference solutions used in the SageMaker Model Monitor migration guidance. -
The bias metrics are standardized arithmetic, not a library dependency. Clarify's pre-training and post-training bias metrics (DPL, DPPL, DI, CI, and related measures) are published, industry-standard formulas computed from label counts and confusion-matrix values. You reproduce them with pandas and scikit-learn, code you own, test, and version inside your own pipeline.
-
SHAP is the same engine Clarify uses. SageMaker Clarify computes feature attribution using the SHAP library internally (see the Clarify SHAP values documentation). Adopting SHAP directly produces Shapley-value attributions.
-
Amazon Bedrock Evaluations for foundation models only. Amazon Bedrock Evaluations and Amazon Bedrock Guardrails are the AWS-managed path for LLM and generative AI evaluation. They replace Clarify's foundation model evaluation (FMEval) capability. They are not a replacement for predictive (tabular/classical ML) bias detection or explainability.
Removing Clarify
Discontinue Clarify Processing Jobs for New Analysis
If your workflow includes running SageMaker Clarify Processing Jobs using the
SageMakerClarifyProcessor class, the clarify.BiasConfig /
clarify.SHAPConfig configurations, or the run_bias() /
run_explainability() methods from the SageMaker Python SDK, transition to the
alternatives described in the Configuring Replacements section below.
Delete or Retain Clarify Output Artifacts (Optional)
Clarify Processing Jobs store output artifacts in Amazon S3:
-
Bias analysis reports:
analysis.json, with pre-training and post-training bias metrics -
SHAP explanations:
explanations_shap/out.csv, with per-instance local SHAP values -
Global SHAP values: Aggregated feature importance scores
-
Partial dependence plots (PDPs): Feature effect visualizations
-
Constraints files: Used by Model Monitor for bias drift and feature attribution drift baselines
If you need to retain these for compliance, regulatory audit trails, or historical reference, leave them in S3 or archive to S3 Glacier. If they are no longer needed, delete the S3 prefix.
Remove Clarify from SageMaker Pipelines (If Used)
If you have SageMaker Pipelines with ClarifyCheckStep or
ProcessingStep steps that invoke Clarify, replace them with a
ProcessingStep that runs the standardized metric computation and SHAP,
following the same pattern the AWS-published reference solutions use inside SageMaker
Pipelines (see Configuring Replacements below).
Configuring Replacements
The replacement path computes the standardized bias metrics directly with pandas and scikit-learn, and computes feature attribution with SHAP (the same engine Clarify uses). You run these inside the AWS-published reference solutions and SageMaker Pipelines, logging results to a SageMaker AI MLflow App so your bias and explainability results carry the same lineage, versioning, and governance as your training runs.
Replacing Pre-Training and Post-Training Bias Detection
Clarify's pre-training metrics are published, standardized formulas based on label counts and class proportions. Computing them directly keeps the bias check simple, owned and testable by your team, and consistent with what the AWS-published reference solutions do inside their monitoring jobs.
Refer to the Clarify pre-training bias metrics reference for the exact formula behind each metric, so your implementation matches Clarify's output.
Clarify's post-training metrics are standardized functions of per-group confusion-matrix counts. Computing them directly from predictions and ground truth keeps the check simple and owned by your team, and it matches the per-segment approach the AWS-published reference solutions use for bias drift. Log the per-segment results to a SageMaker AI MLflow App for governance and trend tracking.
Refer to the Clarify post-training bias metrics reference for the exact formula behind each metric.
Replacing Model Explainability (SHAP Feature Attribution)
The SHAP library is the same engine that powers SageMaker Clarify's explainability. Using it directly gives you more flexibility:
pip install shap
For production use, the AWS-published reference solutions include a
6_shap_explainability.ipynb notebook that generates global feature importance,
individual prediction explanations, and feature interaction analysis for a deployed model,
logging the results to a SageMaker AI MLflow App.
Refer to SHAP
documentation
Replacing Bias Drift and Feature Attribution Drift Monitoring
This approach operationalizes the same standardized metrics (directly computed fairness
metrics and SHAP feature importance) on a schedule, with governance, lineage, and alerting,
using the Amazon SageMaker AI monitoring solutions
Replacing Foundation Model (LLM) Evaluation
Option 1: fmeval Library (Recommended for SageMaker-Hosted Models)
The fmeval library is the same evaluation engine Clarify's foundation
model evaluation is built on, and it can be used independently of Clarify Processing
Jobs:
pip install fmeval
The fmeval library supports: text summarization, question answering,
classification, open-ended generation, factual knowledge, toxicity, robustness (semantic
perturbations), and prompt stereotyping evaluation. For ongoing, in-production LLM quality
monitoring on SageMaker endpoints, the AWS-published LLM Inference Monitoring
Refer to Use the
fmeval library to run an automatic evaluation and Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval
Option 2: Amazon Bedrock Evaluations (Managed Service)
Amazon Bedrock Evaluations provides a managed alternative for foundation model evaluation without requiring Clarify Processing Jobs:
-
Open the Amazon Bedrock console.
-
Choose Evaluations from the left navigation.
-
Choose Create evaluation job.
-
Select evaluation type:
-
Automatic evaluation: Predefined algorithms for accuracy, robustness, toxicity
-
Human evaluation: Bring human workers for subjective quality assessment
-
-
Select models to evaluate (JumpStart models, Bedrock models, or custom endpoints).
-
Choose built-in datasets or upload custom prompt datasets.
-
Review evaluation results with comparative metrics across models.
Bedrock Evaluations supports evaluating models hosted on Bedrock, custom models, and external models (including on-premises or multi-cloud deployments) as long as you provide evaluation data in the required format.
Refer to Evaluate
the performance of Amazon Bedrock resources and Evaluate models or RAG systems using Amazon Bedrock Evaluations
Option 3: Amazon Bedrock Guardrails (Runtime Safety)
For runtime content filtering (toxicity, PII, harmful content) that SageMaker Clarify evaluates at assessment time, Amazon Bedrock Guardrails provides continuous runtime protection for generative AI applications:
-
Content filters: Block harmful, hateful, sexual, violent, or insulting content
-
Denied topics: Prevent model responses on specific topics
-
Sensitive information filters: Detect and redact PII (names, addresses, SSN, credit cards)
-
Contextual grounding checks: Reduce hallucinations by grounding responses in source data
-
Automated Reasoning checks: Validate factual accuracy using formal reasoning
Refer to Detect and filter harmful content by using Amazon Bedrock Guardrails for detailed steps.