向您的数据集中添加图像 - Amazon Lookout for Vision

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

向您的数据集中添加图像

创建数据集后,您可能会希望向数据集中添加更多图像。例如,如果模型评估表明模型性能不佳,则可以通过添加更多图像来提高模型的质量。如果您创建了测试数据集,添加更多图像可以提高模型性能指标的准确度。

更新数据集后重新训练您的模型。

添加更多图像

您可以从本地计算机上传图像,从而向您的数据集中添加更多图像。要使用 SDK 添加更多已标注的图像,请使用 UpdateDatasetEntries 操作。

向数据集中添加更多图像(控制台)
  1. 选择操作,然后选择要向其添加图像的数据集。

  2. 选择要上传到数据集的图像。可以从本地计算机拖动或选择要上传的图像。一次最多可以上传 30 张图像。

  3. 选择上传图像

  4. 选择 Save changes(保存更改)。

添加完更多图像后,您需要标注它们,以便它们可用来训练模型。有关更多信息,请参阅 图像分类(控制台)

添加更多图像 (SDK)

要使用 SDK 添加更多已标注的图像,请使用 UpdateDatasetEntries 操作。您需要提供一个清单文件,其中包含要添加的图像。您还可以在清单文件中 JSON 行的 source-ref 字段指定图像,从而更新现有图像。有关更多信息,请参阅 创建清单文件

向数据集中添加更多图像 (SDK)
  1. 安装并配置 AWS CLI 和 AWS SDK(如果尚未如此)。有关更多信息,请参阅 步骤 4:设置 AWS CLI 和 AWS 软件开发工具包

  2. 使用以下示例代码向数据集中添加更多图像。

    CLI

    更改以下值:

    • project-name 更改为包含要更新的数据集的项目的名称。

    • dataset-type 更改为要更新的数据集类型(traintest)。

    • changes 更改为包含数据集更新的清单文件的位置。

    aws lookoutvision update-dataset-entries\ --project-name project\ --dataset-type train or test\ --changes fileb://manifest file \ --profile lookoutvision-access
    Python

    此代码取自 AWS 文档 SDK 示例 GitHub 存储库。请在此处查看完整示例。

    @staticmethod def update_dataset_entries(lookoutvision_client, project_name, dataset_type, updates_file): """ Adds dataset entries to an Amazon Lookout for Vision dataset. :param lookoutvision_client: The Amazon Rekognition Custom Labels Boto3 client. :param project_name: The project that contains the dataset that you want to update. :param dataset_type: The type of the dataset that you want to update (train or test). :param updates_file: The manifest file of JSON Lines that contains the updates. """ try: status = "" status_message = "" manifest_file = "" # Update dataset entries logger.info(f"""Updating {dataset_type} dataset for project {project_name} with entries from {updates_file}.""") with open(updates_file) as f: manifest_file = f.read() lookoutvision_client.update_dataset_entries( ProjectName=project_name, DatasetType=dataset_type, Changes=manifest_file, ) finished = False while finished == False: dataset = lookoutvision_client.describe_dataset(ProjectName=project_name, DatasetType=dataset_type) status = dataset['DatasetDescription']['Status'] status_message = dataset['DatasetDescription']['StatusMessage'] if status == "UPDATE_IN_PROGRESS": logger.info( (f"Updating {dataset_type} dataset for project {project_name}.")) time.sleep(5) continue if status == "UPDATE_FAILED_ROLLBACK_IN_PROGRESS": logger.info( (f"Update failed, rolling back {dataset_type} dataset for project {project_name}.")) time.sleep(5) continue if status == "UPDATE_COMPLETE": logger.info( f"Dataset updated: {status} : {status_message} : {dataset_type} dataset for project {project_name}.") finished = True continue if status == "UPDATE_FAILED_ROLLBACK_COMPLETE": logger.info( f"Rollback complated after update failure: {status} : {status_message} : {dataset_type} dataset for project {project_name}.") finished = True continue logger.exception( f"Failed. Unexpected state for dataset update: {status} : {status_message} : {dataset_type} dataset for project {project_name}.") raise Exception( f"Failed. Unexpected state for dataset update: {status} : {status_message} :{dataset_type} dataset for project {project_name}.") logger.info(f"Added entries to dataset.") return status, status_message except ClientError as err: logger.exception( f"Couldn't update dataset: {err.response['Error']['Message']}") raise
    Java V2

    此代码取自 AWS 文档 SDK 示例 GitHub 存储库。请在此处查看完整示例。

    /** * Updates an Amazon Lookout for Vision dataset from a manifest file. * Returns after Lookout for Vision updates the dataset. * * @param lfvClient An Amazon Lookout for Vision client. * @param projectName The name of the project in which you want to update a * dataset. * @param datasetType The type of the dataset that you want to update (train or * test). * @param manifestFile The name and location of a local manifest file that you want to * use to update the dataset. * @return DatasetStatus The status of the updated dataset. */ public static DatasetStatus updateDatasetEntries(LookoutVisionClient lfvClient, String projectName, String datasetType, String updateFile) throws FileNotFoundException, LookoutVisionException, InterruptedException { logger.log(Level.INFO, "Updating {0} dataset for project {1}", new Object[] { datasetType, projectName }); InputStream sourceStream = new FileInputStream(updateFile); SdkBytes sourceBytes = SdkBytes.fromInputStream(sourceStream); UpdateDatasetEntriesRequest updateDatasetEntriesRequest = UpdateDatasetEntriesRequest.builder() .projectName(projectName) .datasetType(datasetType) .changes(sourceBytes) .build(); lfvClient.updateDatasetEntries(updateDatasetEntriesRequest); boolean finished = false; DatasetStatus status = null; // Wait until update completes. do { DescribeDatasetRequest describeDatasetRequest = DescribeDatasetRequest.builder() .projectName(projectName) .datasetType(datasetType) .build(); DescribeDatasetResponse describeDatasetResponse = lfvClient .describeDataset(describeDatasetRequest); DatasetDescription datasetDescription = describeDatasetResponse.datasetDescription(); status = datasetDescription.status(); switch (status) { case UPDATE_COMPLETE: logger.log(Level.INFO, "{0} Dataset updated for project {1}.", new Object[] { datasetType, projectName }); finished = true; break; case UPDATE_IN_PROGRESS: logger.log(Level.INFO, "{0} Dataset update for project {1} in progress.", new Object[] { datasetType, projectName }); TimeUnit.SECONDS.sleep(5); break; case UPDATE_FAILED_ROLLBACK_IN_PROGRESS: logger.log(Level.SEVERE, "{0} Dataset update failed for project {1}. Rolling back", new Object[] { datasetType, projectName }); TimeUnit.SECONDS.sleep(5); break; case UPDATE_FAILED_ROLLBACK_COMPLETE: logger.log(Level.SEVERE, "{0} Dataset update failed for project {1}. Rollback completed.", new Object[] { datasetType, projectName }); finished = true; break; default: logger.log(Level.SEVERE, "{0} Dataset update failed for project {1}. Unexpected error returned.", new Object[] { datasetType, projectName }); finished = true; } } while (!finished); return status; }
  3. 重复上一步并提供其他数据集类型的值。