Adding images to your dataset
After you create a dataset, you might want to add more images to the dataset. For example, if model
evaluation indicates a poor model, you can enhance the quality of your model by adding more images. If you have created a
test dataset, adding more images can increase the accuracy of your model's performance
metrics.
Retrain your model after updating your datasets.
Adding more images
You can add more images to your datasets by uploading images from your local
computer. To add more labeled images with the SDK, use the UpdateDatasetEntries
operation.
To add more images to your dataset (console)
-
Choose Actions and select the dataset that you want
to add images to.
-
Choose the images you want to upload to the dataset. You can drag the
images or choose the images that you want to upload from your local
computer. You can upload up to 30 images at a time.
-
Choose Upload images.
-
Choose Save changes.
When you are done adding more images, you need to label them so that they can be
used to train the model. For more information, see Classifying images (console).
Adding more images (SDK)
To add more labeled images with the SDK, use the UpdateDatasetEntries
operation. You supply a manifest file that contains the
images that you want to add. You can also update existing images by
specifying the image in the
source-ref
field of the JSON line in the manifest file. For more information, see
Creating a manifest file.
To add more images to a dataset (SDK)
-
If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see
Step 4: Set up the AWS CLI and AWS SDKs.
-
Use the following example code to add more images to a dataset.
- CLI
-
Change the following values:
-
project-name
to the name of the
project that contains the dataset you want to update.
-
dataset-type
to the type of dataset
that you want to update (train
or
test
).
-
changes
to the location the manifest file
that contain dataset updates.
aws lookoutvision update-dataset-entries\
--project-name project
\
--dataset-type train or test
\
--changes fileb://manifest file
\
--profile lookoutvision-access
- Python
-
This code is taken from the AWS Documentation SDK examples GitHub repository. See the full example
here.
@staticmethod
def update_dataset_entries(lookoutvision_client, project_name, dataset_type, updates_file):
"""
Adds dataset entries to an Amazon Lookout for Vision dataset.
:param lookoutvision_client: The Amazon Rekognition Custom Labels Boto3 client.
:param project_name: The project that contains the dataset that you want to update.
:param dataset_type: The type of the dataset that you want to update (train or test).
:param updates_file: The manifest file of JSON Lines that contains the updates.
"""
try:
status = ""
status_message = ""
manifest_file = ""
# Update dataset entries
logger.info(f"""Updating {dataset_type} dataset for project {project_name}
with entries from {updates_file}.""")
with open(updates_file) as f:
manifest_file = f.read()
lookoutvision_client.update_dataset_entries(
ProjectName=project_name,
DatasetType=dataset_type,
Changes=manifest_file,
)
finished = False
while finished == False:
dataset = lookoutvision_client.describe_dataset(ProjectName=project_name,
DatasetType=dataset_type)
status = dataset['DatasetDescription']['Status']
status_message = dataset['DatasetDescription']['StatusMessage']
if status == "UPDATE_IN_PROGRESS":
logger.info(
(f"Updating {dataset_type} dataset for project {project_name}."))
time.sleep(5)
continue
if status == "UPDATE_FAILED_ROLLBACK_IN_PROGRESS":
logger.info(
(f"Update failed, rolling back {dataset_type} dataset for project {project_name}."))
time.sleep(5)
continue
if status == "UPDATE_COMPLETE":
logger.info(
f"Dataset updated: {status} : {status_message} : {dataset_type} dataset for project {project_name}.")
finished = True
continue
if status == "UPDATE_FAILED_ROLLBACK_COMPLETE":
logger.info(
f"Rollback complated after update failure: {status} : {status_message} : {dataset_type} dataset for project {project_name}.")
finished = True
continue
logger.exception(
f"Failed. Unexpected state for dataset update: {status} : {status_message} : {dataset_type} dataset for project {project_name}.")
raise Exception(
f"Failed. Unexpected state for dataset update: {status} : {status_message} :{dataset_type} dataset for project {project_name}.")
logger.info(f"Added entries to dataset.")
return status, status_message
except ClientError as err:
logger.exception(
f"Couldn't update dataset: {err.response['Error']['Message']}")
raise
- Java V2
-
This code is taken from the AWS Documentation SDK examples GitHub repository. See the full example
here.
/**
* Updates an Amazon Lookout for Vision dataset from a manifest file.
* Returns after Lookout for Vision updates the dataset.
*
* @param lfvClient An Amazon Lookout for Vision client.
* @param projectName The name of the project in which you want to update a
* dataset.
* @param datasetType The type of the dataset that you want to update (train or
* test).
* @param manifestFile The name and location of a local manifest file that you want to
* use to update the dataset.
* @return DatasetStatus The status of the updated dataset.
*/
public static DatasetStatus updateDatasetEntries(LookoutVisionClient lfvClient, String projectName,
String datasetType, String updateFile) throws FileNotFoundException, LookoutVisionException,
InterruptedException {
logger.log(Level.INFO, "Updating {0} dataset for project {1}",
new Object[] { datasetType, projectName });
InputStream sourceStream = new FileInputStream(updateFile);
SdkBytes sourceBytes = SdkBytes.fromInputStream(sourceStream);
UpdateDatasetEntriesRequest updateDatasetEntriesRequest = UpdateDatasetEntriesRequest.builder()
.projectName(projectName)
.datasetType(datasetType)
.changes(sourceBytes)
.build();
lfvClient.updateDatasetEntries(updateDatasetEntriesRequest);
boolean finished = false;
DatasetStatus status = null;
// Wait until update completes.
do {
DescribeDatasetRequest describeDatasetRequest = DescribeDatasetRequest.builder()
.projectName(projectName)
.datasetType(datasetType)
.build();
DescribeDatasetResponse describeDatasetResponse = lfvClient
.describeDataset(describeDatasetRequest);
DatasetDescription datasetDescription = describeDatasetResponse.datasetDescription();
status = datasetDescription.status();
switch (status) {
case UPDATE_COMPLETE:
logger.log(Level.INFO, "{0} Dataset updated for project {1}.",
new Object[] { datasetType, projectName });
finished = true;
break;
case UPDATE_IN_PROGRESS:
logger.log(Level.INFO, "{0} Dataset update for project {1} in progress.",
new Object[] { datasetType, projectName });
TimeUnit.SECONDS.sleep(5);
break;
case UPDATE_FAILED_ROLLBACK_IN_PROGRESS:
logger.log(Level.SEVERE,
"{0} Dataset update failed for project {1}. Rolling back",
new Object[] { datasetType, projectName });
TimeUnit.SECONDS.sleep(5);
break;
case UPDATE_FAILED_ROLLBACK_COMPLETE:
logger.log(Level.SEVERE,
"{0} Dataset update failed for project {1}. Rollback completed.",
new Object[] { datasetType, projectName });
finished = true;
break;
default:
logger.log(Level.SEVERE,
"{0} Dataset update failed for project {1}. Unexpected error returned.",
new Object[] { datasetType, projectName });
finished = true;
}
} while (!finished);
return status;
}
-
Repeat the previous step and provide values for the other dataset
type.