SageMaker Autopilot - Amazon SageMaker

SageMaker Autopilot

Important

As of November 30, 2023, Autopilot's features are migrating to Amazon SageMaker Canvas as part of the updated Studio experience, providing data scientists with no-code capabilities for tasks such as data preparation, feature engineering, algorithm selection, training and tuning, inference, continuous model monitoring, and more. SageMaker Canvas supports a variety of use cases, including computer vision, demand forecasting, intelligent search, and generative AI.

Users of Studio can continue using Autopilot as a standalone feature. However, we encourage users who prefer the convenience of a user interface to explore executing their AutoML tasks within SageMaker Canvas. Users with coding experience can continue using all API instructions and any supported SDK for technical implementation.

All UI-related instructions in this guide pertain to Autopilot's standalone features before migrating to Amazon SageMaker Canvas. Users following these instructions should use Studio.

Amazon SageMaker Autopilot is a feature set that simplifies and accelerates various stages of the machine learning workflow by automating the process of building and deploying machine learning models (AutoML).

Autopilot performs the following key tasks that you can use on autopilot or with various degrees of human guidance:

  • Data analysis and preprocessing: Autopilot identifies your specific problem type, handles missing values, normalizes your data, selects features, and overall prepares the data for model training.

  • Model selection: Autopilot explores a variety of algorithms and uses a cross-validation resampling technique to generate metrics that evaluate the predictive quality of the algorithms based on predefined objective metrics.

  • Hyperparameter optimization: Autopilot automates the search for optimal hyperparameter configurations.

  • Model training and evaluation: Autopilot automates the process of training and evaluating various model candidates. It splits the data into training and validation sets, trains the selected model candidates using the training data, and evaluates their performance on the unseen data of the validation set. Lastly, it ranks the optimized model candidates based on their performance and identifies the best performing model.

  • Model deployment: Once Autopilot has identified the best performing model, it provides the option to deploy the model automatically by generating the model artifacts and the endpoint exposing an API. External applications can send data to the endpoint and receive the corresponding predictions or inferences.

Autopilot supports building machine learning models on large datasets up to hundreds of GBs.

The following diagram outlines the tasks of this AutoML process managed by Autopilot.


      Overview of Amazon SageMaker Autopilot AutoML process.

Depending on your comfort level with the machine learning process and coding experience, you can use Autopilot in different ways:

  • Using the Studio UI, users can choose between a no-code experience or have some level of human input.

    Note

    Only experiments created from tabular data for problem types such as regression or classification are available via the Studio UI.

  • Using the AutoML API, users with coding experience can use available SDKs to create AutoML jobs. This approach provides greater flexibility and customization options and is available for all problem types.

Autopilot currently supports the following problem types:

Note

For regression or classification problems involving tabular data, users can choose between two options: using the Studio user interface or the API Reference.

Tasks such as text and image classification, time-series forecasting, and fine-tuning of large language models are exclusively available through the version 2 of the Autopilot API. Users who prefer the convenience of a user interface can use Amazon SageMaker Canvas to access pre-trained models and generative AI foundation models, or create custom models tailored for specific text, image classification, forecasting needs, or generative AI.

Additionally, Autopilot helps users understand how models make predictions by automatically generating reports that show the importance of each individual feature. This provides transparency and insights into the factors influencing the predictions, which can be used by risk and compliance teams and external regulators. Autopilot also provides a model performance report, which encompasses a summary of evaluation metrics, a confusion matrix, various visualizations such as receiver operating characteristic curves and precision-recall curves, and more. The specific content of each report vary depending on the problem type of the Autopilot experiment.

The explainability and performance reports for the best model candidate in an Autopilot experiment are available for text, image, and tabular data classification problem types.

For tabular data use cases such as regression or classification, Autopilot offers additional visibility into how the data was wrangled and how the model candidates were selected, trained, and tuned by generating notebooks that contain the code used to explore the data and find the best performing model. These notebooks provide an interactive and exploratory environment to help you learn about the impact of various inputs or the trade-offs made in the experiments. You can experiment further with the higher performing model candidate by making your own modifications to the data exploration and candidate definition notebooks provided by Autopilot.

With Amazon SageMaker, you pay only for what you use. You pay for the underlying compute and storage resources within SageMaker or other AWS services, based on your usage. For more information about the cost of using SageMaker, see Amazon SageMaker Pricing.