Build a custom model - Amazon SageMaker

Build a custom model

Use Amazon SageMaker Canvas to build a custom model on the dataset that you've imported. Use the model that you've built to make predictions on new data. SageMaker Canvas uses the information in the dataset to build up to 250 models and choose the one that performs the best.

When you begin building a model, Canvas automatically recommends one or more model types. Model types fall into one of the following categories:

  • Numeric prediction – This is known as regression in machine learning. Use the numeric prediction model type when you want to make predictions for numeric data. For example, you might want to predict the price of houses based on features such as the house’s square footage.

  • Categorical prediction – This is known as classification in machine learning. When you want to categorize data into groups, use the categorical prediction model types:

    • 2 category prediction – Use the 2 category prediction model type (also known as binary classification in machine learning) when you have two categories that you want to predict for your data. For example, you might want to determine whether a customer is likely to churn.

    • 3+ category prediction – Use the 3+ category prediction model type (also known as multi-class classification in machine learning) when you have three or more categories that you want to predict for your data. For example, you might want to predict a customer's loan status based on features such as previous payments.

  • Time series forecasting – Use time series forecasts when you want to make predictions over a period of time. For example, you might want to predict the number of items you’ll sell in the next quarter. For information about time series forecasts, see Time Series Forecasts in Amazon SageMaker Canvas.

  • Image prediction – Use the single-label image prediction model type (also known as single-label image classification in machine learning) when you want to assign labels to images. For example, you might want to classify different types of manufacturing defects in images of your product.

  • Text prediction – Use the multi-category text prediction model type (also known as multi-class text classification in machine learning) when you want to assign labels to passages of text. For example, you might have a dataset of customer reviews for a product, and you want to determine whether customers liked or disliked the product. You might have your model predict whether a given passage of text is Positive, Negative, or Neutral.

For a table of the supported input data types for each model type, see Use custom models.

For each tabular data model that you build (which includes numeric, categorical, time series forecasting, and text prediction models), you choose the Target column. The Target column is the column that contains the information that you want to predict. For example, if you're building a model to predict whether people have cancelled their subscriptions, the Target column contains data points that are either a yes or a no about someone's cancellation status.

For image prediction models, you build the model with a dataset of images that have been assigned labels. For the unlabeled images that you provide, the model predicts a label. For example, if you’re building a model to predict whether an image is a cat or a dog, you provide images labeled as cats or dogs when building the model. Then, the model can accept unlabeled images and predict them as either cats or dogs.

What happens when you build a model

To build your model, you can choose either a Quick build or a Standard build. The Quick build has a shorter build time, but the Standard build generally has a higher accuracy. The following table outlines the average build times for each model and build type, along with the minimum and maximum number of data points you should have for each build type.

Limit Numeric and categorical prediction Time series forecasting Image prediction Text prediction

Quick build time

2‐20 minutes

2‐20 minutes

15‐30 minutes

15‐30 minutes

Standard build time

2‐4 hours

2‐4 hours

2‐5 hours

2‐5 hours

Maximum number of entries (rows or images) for Quick builds

50,000

50,000

5000

7500

If you log out while running a Quick build, your build might be interrupted until you log in again. When you log in again, Canvas resumes the Quick build.

Canvas predicts values by using the information in the rest of the dataset, depending on the model type:

  • For categorical prediction, Canvas puts each row into one of the categories listed in the Target column.

  • For numeric prediction, Canvas uses the information in the dataset to predict the numeric values in the Target column.

  • For time series forecasting, Canvas uses historical data to predict values for the Target column in the future.

  • For image prediction, Canvas uses images that have been assigned labels to predict labels for unlabeled images.

  • For text prediction, Canvas analyzes text data that has been assigned labels to predict labels for passages of unlabeled text.

Additional features to help you build your model

Note

The following features are available for numeric and categorical prediction and time series forecasting models.

Before building your model, you can filter your data or prepare it using advanced transforms. For more information about preparing your data for model building, see Prepare data with advanced transformations.

You can also use visualization and analytics to explore your data and determine which features are best to include in your model. For more information, see Explore and analyze your data.

To learn more about additional features such as previewing your model, validating your dataset, and changing the size of the random sample used to build your model, see Preview your model.

For tabular datasets with multiple columns (such as datasets for building categorical, numeric, or time series forecasting model types), you might have rows with missing data points. While Canvas builds the model, it automatically adds missing values. Canvas uses the values in your dataset to perform a mathematical approximation for the missing values. For the highest model accuracy, we recommend adding in the missing data if you can find it. Note that the missing data feature is not supported for text prediction or image prediction models.

Get started

To get started with building a custom model, see Build a model and follow the procedure for the type of model that you want to build.