Creating an Amazon Rekognition Custom Labels Dataset - Rekognition

Creating an Amazon Rekognition Custom Labels Dataset

Datasets contain the images, labels, and bounding box information that is used to train and test an Amazon Rekognition Custom Labels model. Datasets are managed by Amazon Rekognition Custom Labels projects. You create the initial training dataset for a project during project creation. You can also add new and existing datasets to a project after the project is created. For example, you might create a dataset that is used to test a model.

You can create a dataset using images from one of the following locations.

Purposing Datasets

We recommend that you create separate datasets for different image categories. For example, if you have images of scenic views and images of dogs, you should create a dogs dataset and a scenic views dataset.

Finding Objects, Scenes, and Concepts

If you want to create a model that predicts the presence of objects, scenes, and concepts in your images, you assign image-level labels to the images in your dataset. In this case, your dataset requires at least two labels. For example, the following image shows a scene. Your dataset could include image-level labels such sunrise or countryside.

When you create your dataset, you can include image-level labels for your images. For example, if your images are stored in an Amazon S3 bucket, you can use folder names to add image-level labels. You can also add image-level labels to images after you create a dataset. For more information, see Assigning Image-Level Labels to an Image. You can add new labels as you need them. For more information, see Labeling the Dataset.

The default number of images per dataset is 250,000. You can request an increase for up to 500,000 images per dataset. For more information, Service Quotas.

Detecting Object Locations

To create a model that predicts the location of objects in your images, you define object location bounding boxes and labels for the images in your dataset. A bounding box is a box that tightly surrounds an object. For example, the following image shows bounding boxes around an Amazon Echo and an Amazon Echo Dot. Each bounding box has an assigned label (Amazon Echo or Amazon Echo Dot).

To detect object locations, your dataset needs at least one label. During model training, a further label is automatically created that represents the area outside of the bounding boxes on an image. When you create your dataset, you can include bounding box information for your images. For example, you can import a SageMaker Ground Truth format manifest file that contains bounding boxes. You can also add bounding boxes after you create a dataset. For more information, see Drawing Bounding Boxes. You can add new labels as you need them. For more information, see Labeling the Dataset.

You can have up to 250,000 images per dataset. This value can't be increased. For more information, see Guidelines and Quotas in Amazon Rekognition Custom Labels.

Combining Image-Level and Bounding Box Labeled Images

You can combine image-level labels and bounding box labeled images in a single dataset. In this case, Amazon Rekognition Custom Labels chooses whether to create an image-level model or an object location model.

For information about adding labels to a dataset, see Labeling the Dataset.