Creating training and test datasets with images

You can start with a project that has a single dataset, or a project that has separate training and test datasets. If you start with a single dataset, Amazon Rekognition Custom Labels splits your dataset during training to create a training dataset (80%) and a test dataset (%20) for your project. Start with a single dataset if you want Amazon Rekognition Custom Labels to decide where images are used for training and testing. For complete control over training, testing, and performance tuning, we recommend that you start your project with separate training and test datasets.

You can create training and test datasets for a project by importing images from one of the following locations:

If you start your project with separate training and test datasets, you can use different source locations for each dataset.

Depending on where you import your images from, your images might be unlabeled. For example, images imported from a local computer aren't labeled. Images imported from an Amazon SageMaker Ground Truth manifest file are labeled. You can use the Amazon Rekognition Custom Labels console to add, change, and assign labels. For more information, see Labeling images.

If images are uploading with errors, images are missing, or labels are missing from images, read Debugging a failed model training.

For more information about datasets, see Managing datasets.

Create training and test datasets (SDK)

You can use the AWS SDK to create training and test datasets.

The CreateDataset operation allows you to optionally specify tags when creating a new dataset, for the purposes of categorizing and managing your resources.

Training dataset

You can use the AWS SDK to create a training dataset in the following ways.

Use CreateDataset with an Amazon Sagemaker format manifest file that you provide. For more information, see Creating a manifest file. For example code, see Creating a dataset with a SageMaker Ground Truth manifest file (SDK).
Use CreateDataset to copy an existing Amazon Rekognition Custom Labels dataset. For example code, see Creating a dataset using an existing dataset (SDK).
Create an empty dataset with CreateDataset and add dataset entries at a later time with UpdateDatasetEntries. To create an empty dataset, see Adding a dataset to a project. To add images to a dataset, see Adding more images (SDK). You need to add the dataset entries before you can train a model.

Test dataset

You can use the AWS SDK to create a test dataset in the following ways:

Use CreateDataset with an Amazon Sagemaker format manifest file that you provide. For more information, see Creating a manifest file. For example code, see Creating a dataset with a SageMaker Ground Truth manifest file (SDK).
Use CreateDataset to copy an existing Amazon Rekognition Custom Labels dataset. For example code, see Creating a dataset using an existing dataset (SDK).
Create an empty dataset with CreateDataset and add dataset entries at a later time with UpdateDatasetEntries. To create an empty dataset, see Adding a dataset to a project. To add images to a dataset, see Adding more images (SDK). You need to add the dataset entries before you can train a model.
Split the training dataset into separate training and test datasets. First create an empty test dataset with CreateDataset. Then move 20% of the training dataset entries into the test dataset by calling DistributeDatasetEntries. To create an empty dataset, see Adding a dataset to a project (SDK). To split the training dataset, see Distributing a training dataset (SDK).

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Preparing images

Amazon S3 bucket