Amazon SageMaker
Developer Guide

Explore and Preprocess Data

Before using a dataset to train a model, data scientists typically explore and preprocess it. For example, in one of the exercises in this guide, you use the MNIST dataset, a commonly available dataset of handwritten numbers, for model training. Before you begin training, you transform the data into a format that is more efficient for training. For more information, see Step 4.3: Transform the Training Dataset and Upload It to Amazon S3.

To preprocess data use one of the following methods:

  • Use a Jupyter notebook on an Amazon SageMaker notebook instance. You can also use the notebook instance to do the following:

    • Write code to create model training jobs

    • Deploy models to Amazon SageMaker hosting

    • Test or validate your models

    For more information, see Use Notebook Instances

  • You can use a model to transform data by using Amazon SageMaker batch transform. For more information, see Step 6.2: Deploy the Model with Batch Transform.

How It Works: Next Topic

Train a Model with Amazon SageMaker