You can create a dataset using an Amazon SageMaker AI Ground Truth format manifest file. You can use the manifest file from an Amazon SageMaker AI Ground Truth job. If your images and labels aren't in the format of a SageMaker AI Ground Truth manifest file, you can create a SageMaker AI format manifest file and use it to import your labeled images.
The CreateDataset
operation is updated to allow you to optionally
specify tags when creating a new dataset. Tags are key-value pairs that you can use
to categorize and manage your resources.
Topics
- Creating a dataset with a SageMaker AI Ground Truth manifest file (Console)
- Creating a dataset with a SageMaker AI Ground Truth manifest file (SDK)
- Create dataset request
- Labeling images with an Amazon SageMaker AI Ground Truth job
- Creating a manifest file
- Importing image-level labels in manifest files
- Object localization in manifest files
- Validation rules for manifest files
- Converting other dataset formats to a manifest file
Creating a dataset with
a SageMaker AI Ground Truth manifest file (Console)
The following procedure shows you how to create a dataset by using a SageMaker AI Ground Truth format manifest file.
-
Create a manifest file for your training dataset by doing one of the following:
-
Create a manifest file with a SageMaker AI GroundTruth Job by following the instructions at Labeling images with an Amazon SageMaker AI Ground Truth job.
-
Create your own manifest file by following the instructions at Creating a manifest file.
If you want to create a test dataset, repeat step 1 to create the test dataset.
-
Open the Amazon Rekognition console at https://console.aws.amazon.com/rekognition/
. -
Choose Use Custom Labels.
-
Choose Get started.
-
In the left navigation pane, choose Projects.
-
In the Projects page, choose the project to which you want to add a dataset. The details page for your project is displayed.
-
Choose Create dataset. The Create dataset page is shown.
-
In Starting configuration, choose either Start with a single dataset or Start with a training dataset. To create a higher quality model, we recommend starting with separate training and test datasets.
-
In the Training dataset details section, choose Import images labeled by SageMaker Ground Truth.
-
In .manifest file location enter the location of the manifest file that you created in step 1.
-
Choose Create Dataset. The datasets page for your project opens.
-
If you need to add or change labels, do Labeling images.
-
Follow the steps in Training a model (Console) to train your model.
Creating a dataset with a
SageMaker AI Ground Truth manifest file (SDK)
The following procedure shows you how to create training or test datasets from a manifest file by using the CreateDataset API.
You can use an existing manifest file, such as the output from an SageMaker AI Ground Truth job, or create your own manifest file.
-
If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see Step 4: Set up the AWS CLI and AWS SDKs.
-
Create a manifest file for your training dataset by doing one of the following:
-
Create a manifest file with a SageMaker AI GroundTruth Job by following the instructions at Labeling images with an Amazon SageMaker AI Ground Truth job.
-
Create your own manifest file by following the instructions at Creating a manifest file.
If you want to create a test dataset, repeat step 2 to create the test dataset.
-
-
Use the following example code to create the training and test dataset.
Use the following code to create a dataset. Replace the following:
-
project_arn
— the ARN of the project that you want to add the test dataset to. -
type
— the type of dataset that you want to create (TRAIN or TEST) -
bucket
— the bucket that contains the manifest file for the dataset. -
manifest_file
— the path and file name of the manifest file.
aws rekognition create-dataset --project-arn
project_arn
\ --dataset-typetype
\ --dataset-source '{ "GroundTruthManifest": { "S3Object": { "Bucket": "bucket
", "Name": "manifest_file
" } } }' \ --profile custom-labels-access --tags '{"key1": "value1", "key2": "value2"}' -
If you need to add or change labels, see Managing Labels (SDK).
-
Follow the steps in Training a model (SDK) to train your model.
Create dataset request
The following is the foramt of the CreateDataset operation request:
{
"DatasetSource": {
"DatasetArn": "string",
"GroundTruthManifest": {
"S3Object": {
"Bucket": "string",
"Name": "string",
"Version": "string"
}
}
},
"DatasetType": "string",
"ProjectArn": "string",
"Tags": {
"string": "string"
}
}