Amazon SageMaker
Developer Guide

Explore and Preprocess Data

Before using a dataset to train a model, data scientists typically explore and preprocess it. For example, in one of the exercises in this guide, you use the MNIST dataset, a commonly available dataset of handwritten numbers, for model training. Before you begin training, you transform the data into a format that is more efficient for training. For more information, see Step 2.2.3: Transform the Training Dataset and Upload It to Amazon S3.

To preprocess data use one of the following methods:

  • Use a Jupyter notebook on an Amazon SageMaker notebook instance. You can also use the notebook instance to write code to create model training jobs, deploy models to Amazon SageMaker hosting, and test or validate your models. For more information, see Use Notebook Instances

  • You can use a model to transform data by using Amazon SageMaker batch transform. For more information, see Step 2.4.2: Deploy the Model to Batch Transform .

How It Works: Next Topic

Train a Model with Amazon SageMaker