Evaluating and comparing Amazon SageMaker JumpStart text classification models

SageMaker AI JumpStart offers multiple text classification models that categorize text into predefined classes. These models handle tasks such as sentiment analysis, topic classification, and content moderation. Choosing the right model for production requires careful evaluation using key metrics including accuracy, F1-score, and Matthews Correlation Coefficient (MCC).

In this guide, you:

Deploy multiple text classification models (DistilBERT and BERT) from the JumpStart catalog.
Run comprehensive evaluations across balanced, skewed, and challenging datasets.
Interpret advanced metrics including Matthews Correlation Coefficient (MCC) and Area Under the Curve Receiver Operating Characteristic scores.
Make data-driven model selection decisions using systematic comparison frameworks.
Set up production deployments with auto-scaling and CloudWatch monitoring.

Download the complete evaluation framework: JumpStart Model Evaluation Package. The package includes pre-run results with sample outputs so you can preview the evaluation process and metrics before deploying models yourself.

Prerequisites

Before you begin, make sure that you have the following:

AWS account with SageMaker AI permissions.
SageMaker AI Amazon SageMaker Studio access.
Basic Python knowledge.
Understanding of text classification concepts.

Time and cost: 45 minutes total time. Costs vary based on instance types and usage duration - see SageMaker AI Pricing for current rates.

This tutorial includes step-by-step cleanup instructions to help you remove all resources and avoid ongoing charges.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Troubleshooting

Set up your evaluation environment