使用資訊清單檔案匯入影像 - Rekognition

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

使用資訊清單檔案匯入影像

您可以使用 Amazon SageMaker Ground Truth 格式資訊清單檔案建立資料集。您可以使用 Amazon SageMaker Ground Truth 任務中的清單文件。如果您的影像和標籤不是 SageMaker Ground Truth 資訊清單檔案的格式,您可以建立 SageMaker 格式資訊清單檔案,並使用它來匯入您的標籤影像。

CreateDataset作業已更新,可讓您在建立新資料集時選擇性地指定標籤。標籤是可用來分類和管理資源的索引鍵值配對。

使用 SageMaker Ground Truth 清單文件創建數據集(控制台)

下列程序說明如何使用 SageMaker Ground Truth 格式資訊清單檔案建立資料集。

  1. 執行下列其中一項操作,建立訓練資料集的清單檔案:

    如果您要建立測試資料集,請重複步驟 1 即可建立測試資料集。

  2. 開啟亞馬遜重新認知主控台,位於。https://console.aws.amazon.com/rekognition/

  3. 選擇使用自訂標籤

  4. 選擇開始使用

  5. 在左側導覽視窗中,選擇專案

  6. 專案 頁面,選擇您要新增資料集的專案。專案的詳細資訊頁面隨即顯示。

  7. 選擇建立資料集建立資料集頁面即會顯示。

  8. 開始設定中,選擇從單一資料集開始從訓練資料集開始。若要建立更高品質的模型,我們建議您從個別的訓練和測試資料集開始。

    Single dataset
    1. 在 [訓練資料集詳細資料] 區段中,選擇 [匯入由 SageMaker Ground Truth 標記的影像

    2. .manifest 檔案位置,輸入您在步驟 1 建立之清單檔案的位置。

    3. 選擇建立資料集。專案的資料集頁面隨即開啟。

    Separate training and test datasets
    1. 在 [訓練資料集詳細資料] 區段中,選擇 [匯入由 SageMaker Ground Truth 標記的影像

    2. .manifest 檔案位置,輸入您在步驟 1 建立之訓練資料集清單檔案的位置。

    3. 在 [測試資料集詳細資料] 區段中,選擇 [匯入由 SageMaker Ground Truth 標記的影像

      注意

      您的訓練和測試資料集可以有不同的影像來源。

    4. .manifest 檔案位置,輸入您在步驟 1 建立之測試資料集清單檔案的位置。

    5. 選擇建立資料集。專案的資料集頁面隨即開啟。

  9. 如果您需要新增或變更標籤,請執行 標記檔案

  10. 請遵循 培訓模型(主控台) 中的步驟訓練模型。

使用 SageMaker Ground Truth 清單文件創建數據集(SDK)

下列程序說明如何使用資訊清單檔案建立訓練或測試資料集CreateDatasetAPI。

您可以使用現有的資訊清單檔案,例如 SageMaker Ground Truth 工作的輸出,或建立您自己的資訊清單檔案

  1. 如果您尚未這麼做,請安裝並設定 AWS CLI 和 AWS SDKs. 如需詳細資訊,請參閱步驟 4:設定 AWS CLI 以及 AWS SDKs

  2. 執行下列其中一項操作,建立訓練資料集的清單檔案:

    如果您要建立測試資料集,請重複步驟 2 即可建立測試資料集。

  3. 使用以下程式碼範例建立訓練和測試資料集。

    AWS CLI

    使用下列程式碼建立資料集。取代以下項目:

    • project_arn— 您要新增測試資料集ARN的專案。

    • type— 您要建立的資料集類型 (TRAIN或TEST)

    • bucket - 包含資料集之清單檔案的儲存貯體。

    • manifest_file - 清單檔案的路徑和檔案名稱

    aws rekognition create-dataset --project-arn project_arn \ --dataset-type type \ --dataset-source '{ "GroundTruthManifest": { "S3Object": { "Bucket": "bucket", "Name": "manifest_file" } } }' \ --profile custom-labels-access --tags '{"key1": "value1", "key2": "value2"}'
    Python

    使用下列值建立資料集。請提供以下命令列參數:

    • project_arn— 您要新增測試資料集ARN的專案。

    • dataset_type — 您要建立的資料集類型 (traintest)。

    • bucket - 包含資料集之清單檔案的儲存貯體。

    • manifest_file - 清單檔案的路徑和檔案名稱

    #Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. #PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-custom-labels-developer-guide/blob/master/LICENSE-SAMPLECODE.) import argparse import logging import time import json import boto3 from botocore.exceptions import ClientError logger = logging.getLogger(__name__) def create_dataset(rek_client, project_arn, dataset_type, bucket, manifest_file): """ Creates an Amazon Rekognition Custom Labels dataset. :param rek_client: The Amazon Rekognition Custom Labels Boto3 client. :param project_arn: The ARN of the project in which you want to create a dataset. :param dataset_type: The type of the dataset that you want to create (train or test). :param bucket: The S3 bucket that contains the manifest file. :param manifest_file: The path and filename of the manifest file. """ try: #Create the project logger.info("Creating %s dataset for project %s",dataset_type, project_arn) dataset_type = dataset_type.upper() dataset_source = json.loads( '{ "GroundTruthManifest": { "S3Object": { "Bucket": "' + bucket + '", "Name": "' + manifest_file + '" } } }' ) response = rek_client.create_dataset( ProjectArn=project_arn, DatasetType=dataset_type, DatasetSource=dataset_source ) dataset_arn=response['DatasetArn'] logger.info("dataset ARN: %s",dataset_arn) finished=False while finished is False: dataset=rek_client.describe_dataset(DatasetArn=dataset_arn) status=dataset['DatasetDescription']['Status'] if status == "CREATE_IN_PROGRESS": logger.info("Creating dataset: %s ",dataset_arn) time.sleep(5) continue if status == "CREATE_COMPLETE": logger.info("Dataset created: %s", dataset_arn) finished=True continue if status == "CREATE_FAILED": error_message = f"Dataset creation failed: {status} : {dataset_arn}" logger.exception(error_message) raise Exception (error_message) error_message = f"Failed. Unexpected state for dataset creation: {status} : {dataset_arn}" logger.exception(error_message) raise Exception(error_message) return dataset_arn except ClientError as err: logger.exception("Couldn't create dataset: %s",err.response['Error']['Message']) raise def add_arguments(parser): """ Adds command line arguments to the parser. :param parser: The command line parser. """ parser.add_argument( "project_arn", help="The ARN of the project in which you want to create the dataset." ) parser.add_argument( "dataset_type", help="The type of the dataset that you want to create (train or test)." ) parser.add_argument( "bucket", help="The S3 bucket that contains the manifest file." ) parser.add_argument( "manifest_file", help="The path and filename of the manifest file." ) def main(): logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s") try: #Get command line arguments. parser = argparse.ArgumentParser(usage=argparse.SUPPRESS) add_arguments(parser) args = parser.parse_args() print(f"Creating {args.dataset_type} dataset for project {args.project_arn}") #Create the dataset. session = boto3.Session(profile_name='custom-labels-access') rekognition_client = session.client("rekognition") dataset_arn=create_dataset(rekognition_client, args.project_arn, args.dataset_type, args.bucket, args.manifest_file) print(f"Finished creating dataset: {dataset_arn}") except ClientError as err: logger.exception("Problem creating dataset: %s", err) print(f"Problem creating dataset: {err}") if __name__ == "__main__": main()
    Java V2

    使用下列值建立資料集。請提供以下命令列參數:

    • project_arn— 您要新增測試資料集ARN的專案。

    • dataset_type — 您要建立的資料集類型 (traintest)。

    • bucket - 包含資料集之清單檔案的儲存貯體。

    • manifest_file - 清單檔案的路徑和檔案名稱

    /* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: Apache-2.0 */ package com.example.rekognition; import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider; import software.amazon.awssdk.regions.Region; import software.amazon.awssdk.services.rekognition.RekognitionClient; import software.amazon.awssdk.services.rekognition.model.CreateDatasetRequest; import software.amazon.awssdk.services.rekognition.model.CreateDatasetResponse; import software.amazon.awssdk.services.rekognition.model.DatasetDescription; import software.amazon.awssdk.services.rekognition.model.DatasetSource; import software.amazon.awssdk.services.rekognition.model.DatasetStatus; import software.amazon.awssdk.services.rekognition.model.DatasetType; import software.amazon.awssdk.services.rekognition.model.DescribeDatasetRequest; import software.amazon.awssdk.services.rekognition.model.DescribeDatasetResponse; import software.amazon.awssdk.services.rekognition.model.GroundTruthManifest; import software.amazon.awssdk.services.rekognition.model.RekognitionException; import software.amazon.awssdk.services.rekognition.model.S3Object; import java.util.logging.Level; import java.util.logging.Logger; public class CreateDatasetManifestFiles { public static final Logger logger = Logger.getLogger(CreateDatasetManifestFiles.class.getName()); public static String createMyDataset(RekognitionClient rekClient, String projectArn, String datasetType, String bucket, String name) throws Exception, RekognitionException { try { logger.log(Level.INFO, "Creating {0} dataset for project : {1} from s3://{2}/{3} ", new Object[] { datasetType, projectArn, bucket, name }); DatasetType requestDatasetType = null; switch (datasetType) { case "train": requestDatasetType = DatasetType.TRAIN; break; case "test": requestDatasetType = DatasetType.TEST; break; default: logger.log(Level.SEVERE, "Could not create dataset. Unrecognized dataset type: {0}", datasetType); throw new Exception("Could not create dataset. Unrecognized dataset type: " + datasetType); } GroundTruthManifest groundTruthManifest = GroundTruthManifest.builder() .s3Object(S3Object.builder().bucket(bucket).name(name).build()).build(); DatasetSource datasetSource = DatasetSource.builder().groundTruthManifest(groundTruthManifest).build(); CreateDatasetRequest createDatasetRequest = CreateDatasetRequest.builder().projectArn(projectArn) .datasetType(requestDatasetType).datasetSource(datasetSource).build(); CreateDatasetResponse response = rekClient.createDataset(createDatasetRequest); boolean created = false; do { DescribeDatasetRequest describeDatasetRequest = DescribeDatasetRequest.builder() .datasetArn(response.datasetArn()).build(); DescribeDatasetResponse describeDatasetResponse = rekClient.describeDataset(describeDatasetRequest); DatasetDescription datasetDescription = describeDatasetResponse.datasetDescription(); DatasetStatus status = datasetDescription.status(); logger.log(Level.INFO, "Creating dataset ARN: {0} ", response.datasetArn()); switch (status) { case CREATE_COMPLETE: logger.log(Level.INFO, "Dataset created"); created = true; break; case CREATE_IN_PROGRESS: Thread.sleep(5000); break; case CREATE_FAILED: String error = "Dataset creation failed: " + datasetDescription.statusAsString() + " " + datasetDescription.statusMessage() + " " + response.datasetArn(); logger.log(Level.SEVERE, error); throw new Exception(error); default: String unexpectedError = "Unexpected creation state: " + datasetDescription.statusAsString() + " " + datasetDescription.statusMessage() + " " + response.datasetArn(); logger.log(Level.SEVERE, unexpectedError); throw new Exception(unexpectedError); } } while (created == false); return response.datasetArn(); } catch (RekognitionException e) { logger.log(Level.SEVERE, "Could not create dataset: {0}", e.getMessage()); throw e; } } public static void main(String[] args) { String datasetType = null; String bucket = null; String name = null; String projectArn = null; String datasetArn = null; final String USAGE = "\n" + "Usage: " + "<project_arn> <dataset_type> <dataset_arn>\n\n" + "Where:\n" + " project_arn - the ARN of the project that you want to add copy the datast to.\n\n" + " dataset_type - the type of the dataset that you want to create (train or test).\n\n" + " bucket - the S3 bucket that contains the manifest file.\n\n" + " name - the location and name of the manifest file within the bucket.\n\n"; if (args.length != 4) { System.out.println(USAGE); System.exit(1); } projectArn = args[0]; datasetType = args[1]; bucket = args[2]; name = args[3]; try { // Get the Rekognition client RekognitionClient rekClient = RekognitionClient.builder() .credentialsProvider(ProfileCredentialsProvider.create("custom-labels-access")) .region(Region.US_WEST_2) .build(); // Create the dataset datasetArn = createMyDataset(rekClient, projectArn, datasetType, bucket, name); System.out.println(String.format("Created dataset: %s", datasetArn)); rekClient.close(); } catch (RekognitionException rekError) { logger.log(Level.SEVERE, "Rekognition client error: {0}", rekError.getMessage()); System.exit(1); } catch (Exception rekError) { logger.log(Level.SEVERE, "Error: {0}", rekError.getMessage()); System.exit(1); } } }
  4. 如果需要新增或變更標籤,請參閱 管理標籤 (SDK)

  5. 請遵循 培訓模型 (SDK) 中的步驟訓練模型。

建立資料集要求

以下是 CreateDataset 操作請求的論壇:

{ "DatasetSource": { "DatasetArn": "string", "GroundTruthManifest": { "S3Object": { "Bucket": "string", "Name": "string", "Version": "string" } } }, "DatasetType": "string", "ProjectArn": "string", "Tags": { "string": "string" } }