JumpStart models and solution templates Supervised learning algorithms Unsupervised Learning Textual Analysis Image Processing

Use Amazon SageMaker Built-in Algorithms or Pre-trained Models

Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning practitioners get started on training and deploying machine learning models quickly. For someone who is new to SageMaker, choosing the right algorithm for your particular use case can be a challenging task. The following table provides a quick cheat sheet that shows how you can start with an example problem or use case and find an appropriate built-in algorithm offered by SageMaker that is valid for that problem type. Additional guidance organized by learning paradigms (supervised and unsupervised) and important data domains (text and images) is provided in the sections following the table.

Table: Mapping use cases to built-in algorithms
Example problems and use cases	Learning paradigm or domain	Problem types	Data input format	Built-in algorithms
Here a few examples out of the 15 problem types that can be addressed by the pre-trained models and pre-built solution templates provided by SageMaker JumpStart: Question answering: chatbot that outputs an answer for a given question. Text analysis: analyze texts from models specific to an industry domain such as finance.	Pre-trained models and pre-built solution templates	Image Classification Tabular Classification Tabular Regression Text Classification Object Detection Text Embedding Question Answering Sentence Pair Classification Image Embedding Named Entity Recognition Instance Segmentation Text Generation Text Summarization Semantic Segmentation Machine Translation	Image, Text, Tabular	Popular models, including Mobilenet, YOLO, Faster R-CNN, BERT, lightGBM, and CatBoost For a list of pre-trained models available, see JumpStart Models. For a list of pre-built solution templates available, see JumpStart Solutions.
Predict if an item belongs to a category: an email spam filter	Supervised Learning	Binary/multi-class classification	Tabular	AutoGluon-Tabular, CatBoost, Factorization Machines Algorithm, K-Nearest Neighbors (k-NN) Algorithm, LightGBM, Linear Learner Algorithm, TabTransformer, XGBoost Algorithm
Predict a numeric/continuous value: estimate the value of a house		Regression	Tabular	AutoGluon-Tabular, CatBoost, Factorization Machines Algorithm, K-Nearest Neighbors (k-NN) Algorithm, LightGBM, Linear Learner Algorithm, TabTransformer, XGBoost Algorithm
Based on historical data for a behavior, predict future behavior: predict sales on a new product based on previous sales data.		Time-series forecasting	Tabular	DeepAR Forecasting Algorithm
Improve the data embeddings of the high-dimensional objects: identify duplicate support tickets or find the correct routing based on similarity of text in the tickets		Embeddings: convert high-dimensional objects into low-dimensional space.	Tabular	Object2Vec Algorithm
Drop those columns from a dataset that have a weak relation with the label/target variable: the color of a car when predicting its mileage.	Unsupervised Learning	Feature engineering: dimensionality reduction	Tabular	Principal Component Analysis (PCA) Algorithm
Detect abnormal behavior in application: spot when an IoT sensor is sending abnormal readings		Anomaly detection	Tabular	Random Cut Forest (RCF) Algorithm
Protect your application from suspicious users: detect if an IP address accessing a service might be from a bad actor		IP anomaly detection	Tabular	IP Insights
Group similar objects/data together: find high-, medium-, and low-spending customers from their transaction histories		Clustering or grouping	Tabular	K-Means Algorithm
Organize a set of documents into topics (not known in advance): tag a document as belonging to a medical category based on the terms used in the document.		Topic modeling	Text	Latent Dirichlet Allocation (LDA) Algorithm, Neural Topic Model (NTM) Algorithm
Assign pre-defined categories to documents in a corpus: categorize books in a library into academic disciplines	Textual Analysis	Text classification	Text	BlazingText algorithm, Text Classification - TensorFlow
Convert text from one language to other: Spanish to English		Machine translation algorithm	Text	Sequence-to-Sequence Algorithm
Summarize a long text corpus: an abstract for a research paper		Text summarization	Text	Sequence-to-Sequence Algorithm
Convert audio files to text: transcribe call center conversations for further analysis		Speech-to-text	Text	Sequence-to-Sequence Algorithm
Label/tag an image based on the content of the image: alerts about adult content in an image	Image Processing	Image and multi-label classification	Image	Image Classification - MXNet
Classify something in an image using transfer learning.		Image classification	Image	Image Classification - TensorFlow
Detect people and objects in an image: police review a large photo gallery for a missing person		Object detection and classification	Image	Object Detection - MXNet, Object Detection - TensorFlow
Tag every pixel of an image individually with a category: self-driving cars prepare to identify objects in their way		Computer vision	Image	Semantic Segmentation Algorithm

For important information about Docker registry paths, data formats, recommenced Amazon EC2 instance types, and CloudWatch logs common to all of the built-in algorithms provided by SageMaker, see Common Information About Built-in Algorithms.

The following sections provide additional guidance for the Amazon SageMaker built-in algorithms grouped by the supervised and unsupervised learning paradigms to which they belong. For descriptions of these learning paradigms and their associated problem types, see Choose an Algorithm. Sections are also provided for the SageMaker built-in algorithms available to address two important machine learning domains: textual analysis and image processing.

Pre-trained Models and Solution Templates
Supervised Learning
Unsupervised Learning
Textual Analysis
Image Processing

Pre-trained Models and Solution Templates

SageMaker JumpStart provides a wide range of pre-trained models, pre-built solution templates, and examples for popular problem types that use the SageMaker SDK as well as Studio Classic. For more information about these models, solutions, and the example notebooks provided by SageMaker JumpStart, see SageMaker JumpStart.

Supervised Learning

Amazon SageMaker provides several built-in general purpose algorithms that can be used for either classification or regression problems.

AutoGluon-Tabular—an open-source AutoML framework that succeeds by ensembling models and stacking them in multiple layers.
CatBoost—an implementation of the gradient-boosted trees algorithm that introduces ordered boosting and an innovative algorithm for processing categorical features.
Factorization Machines Algorithm—an extension of a linear model that is designed to economically capture interactions between features within high-dimensional sparse datasets.
K-Nearest Neighbors (k-NN) Algorithm—a non-parametric method that uses the k nearest labeled points to assign a label to a new data point for classification or a predicted target value from the average of the k nearest points for regression.
LightGBM—an implementation of the gradient-boosted trees algorithm that adds two novel techniques for improved efficiency and scalability: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB).
Linear Learner Algorithm—learns a linear function for regression or a linear threshold function for classification.
TabTransformer—a novel deep tabular data modeling architecture built on self-attention-based Transformers.
XGBoost Algorithm—an implementation of the gradient-boosted trees algorithm that combines an ensemble of estimates from a set of simpler and weaker models.

Amazon SageMaker also provides several built-in supervised learning algorithms that are used for more specialized tasks during feature engineering and forecasting from time series data.

Object2Vec Algorithm—a new highly customizable multi-purpose algorithm used for feature engineering. It can learn low-dimensional dense embeddings of high-dimensional objects to produce features that improve training efficiencies for downstream models. While this is a supervised algorithm, as it requires labeled data for training, there are many scenarios in which the relationship labels can be obtained purely from natural clusterings in data, without any explicit human annotation.
DeepAR Forecasting Algorithm—a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN).

Unsupervised Learning

Amazon SageMaker provides several built-in algorithms that can be used for a variety of unsupervised learning tasks such as clustering, dimension reduction, pattern recognition, and anomaly detection.

Principal Component Analysis (PCA) Algorithm—reduces the dimensionality (number of features) within a dataset by projecting data points onto the first few principal components. The objective is to retain as much information or variation as possible. For mathematicians, principal components are eigenvectors of the data's covariance matrix.
K-Means Algorithm—finds discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups.
IP Insights—learns the usage patterns for IPv4 addresses. It is designed to capture associations between IPv4 addresses and various entities, such as user IDs or account numbers.
Random Cut Forest (RCF) Algorithm—detects anomalous data points within a data set that diverge from otherwise well-structured or patterned data.

Textual Analysis

SageMaker provides algorithms that are tailored to the analysis of textual documents used in natural language processing, document classification or summarization, topic modeling or classification, and language transcription or translation.

BlazingText algorithm—a highly optimized implementation of the Word2vec and text classification algorithms that scale to large datasets easily. It is useful for many downstream natural language processing (NLP) tasks.
Sequence-to-Sequence Algorithm—a supervised algorithm commonly used for neural machine translation.
Latent Dirichlet Allocation (LDA) Algorithm—an algorithm suitable for determining topics in a set of documents. It is an unsupervised algorithm, which means that it doesn't use example data with answers during training.
Neural Topic Model (NTM) Algorithm—another unsupervised technique for determining topics in a set of documents, using a neural network approach.
Text Classification - TensorFlow—a supervised algorithm that supports transfer learning with available pretrained models for text classification.

Image Processing

SageMaker also provides image processing algorithms that are used for image classification, object detection, and computer vision.

Image Classification - MXNet—uses example data with answers (referred to as a supervised algorithm). Use this algorithm to classify images.
Image Classification - TensorFlow—uses pretrained TensorFlow Hub models to fine-tune for specific tasks (referred to as a supervised algorithm). Use this algorithm to classify images.
Semantic Segmentation Algorithm—provides a fine-grained, pixel-level approach to developing computer vision applications.
Object Detection - MXNet—detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene.
Object Detection - TensorFlow—detects bounding boxes and object labels in an image. It is a supervised learning algorithm that supports transfer learning with available pretrained TensorFlow models.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Choose an Algorithm

Common Information