Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Batch prediction dataset requirements

Focus mode
Batch prediction dataset requirements - Amazon SageMaker AI

For batch predictions, make sure that your datasets meet the requirements outlined in Create a dataset. If your dataset is larger than 5 GB, then Canvas uses Amazon EMR Serverless to process your data and split it into smaller batches. After your data has been split, Canvas uses SageMaker AI Batch Transform to make predictions. You may see charges from both of these services after running batch predictions. For more information, see Canvas pricing.

You might not be able to make predictions on some datasets if they have incompatible schemas. A schema is an organizational structure. For a tabular dataset, the schema is the names of the columns and the data type of the data in the columns. An incompatible schema might happen for one of the following reasons:

  • The dataset that you're using to make predictions has fewer columns than the dataset that you're using to build the model.

  • The data types in the columns you used to build the dataset might be different from the data types in dataset that you're using to make predictions.

  • The dataset that you're using to make predictions and the dataset that you've used to build the model have column names that don't match. The column names are case sensitive. Column1 is not the same as column1.

To ensure that you can successfully generate batch predictions, match the schema of your batch predictions dataset to the dataset you used to train the model.

Note

For batch predictions, if you dropped any columns when building your model, Canvas adds the dropped columns back to the prediction results. However, Canvas does not add the dropped columns to your batch predictions for time series models.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.