本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用資訊清單檔案匯入影像
您可以使用 Amazon SageMaker Ground Truth 格式資訊清單檔案建立資料集。您可以使用 Amazon SageMaker Ground Truth 任務中的清單文件。如果您的影像和標籤不是 SageMaker Ground Truth 資訊清單檔案的格式,您可以建立 SageMaker 格式資訊清單檔案,並使用它來匯入您的標籤影像。
CreateDataset
作業已更新,可讓您在建立新資料集時選擇性地指定標籤。標籤是可用來分類和管理資源的索引鍵值配對。
使用 SageMaker Ground Truth 清單文件創建數據集(控制台)
下列程序說明如何使用 SageMaker Ground Truth 格式資訊清單檔案建立資料集。
-
執行下列其中一項操作,建立訓練資料集的清單檔案:
如果您要建立測試資料集,請重複步驟 1 即可建立測試資料集。
開啟亞馬遜重新認知主控台,位於。https://console.aws.amazon.com/rekognition/
-
選擇使用自訂標籤。
-
選擇開始使用。
-
在左側導覽視窗中,選擇專案。
-
在專案 頁面,選擇您要新增資料集的專案。專案的詳細資訊頁面隨即顯示。
-
選擇建立資料集。建立資料集頁面即會顯示。
-
在開始設定中,選擇從單一資料集開始或從訓練資料集開始。若要建立更高品質的模型,我們建議您從個別的訓練和測試資料集開始。
- Single dataset
-
-
在 [訓練資料集詳細資料] 區段中,選擇 [匯入由 SageMaker Ground Truth 標記的影像
-
在 .manifest 檔案位置,輸入您在步驟 1 建立之清單檔案的位置。
-
選擇建立資料集。專案的資料集頁面隨即開啟。
- Separate training and test datasets
-
-
在 [訓練資料集詳細資料] 區段中,選擇 [匯入由 SageMaker Ground Truth 標記的影像
-
在 .manifest 檔案位置,輸入您在步驟 1 建立之訓練資料集清單檔案的位置。
-
在 [測試資料集詳細資料] 區段中,選擇 [匯入由 SageMaker Ground Truth 標記的影像
-
在 .manifest 檔案位置,輸入您在步驟 1 建立之測試資料集清單檔案的位置。
-
選擇建立資料集。專案的資料集頁面隨即開啟。
-
如果您需要新增或變更標籤,請執行 標記檔案。
-
請遵循 培訓模型(主控台) 中的步驟訓練模型。
使用 SageMaker Ground Truth 清單文件創建數據集(SDK)
下列程序說明如何使用資訊清單檔案建立訓練或測試資料集CreateDatasetAPI。
您可以使用現有的資訊清單檔案,例如 SageMaker Ground Truth 工作的輸出,或建立您自己的資訊清單檔案。
-
如果您尚未這麼做,請安裝並設定 AWS CLI 和 AWS SDKs. 如需詳細資訊,請參閱步驟 4:設定 AWS CLI 以及 AWS SDKs。
-
執行下列其中一項操作,建立訓練資料集的清單檔案:
如果您要建立測試資料集,請重複步驟 2 即可建立測試資料集。
-
使用以下程式碼範例建立訓練和測試資料集。
- AWS CLI
-
使用下列程式碼建立資料集。取代以下項目:
-
project_arn
— 您要新增測試資料集ARN的專案。
-
type
— 您要建立的資料集類型 (TRAIN或TEST)
-
bucket
- 包含資料集之清單檔案的儲存貯體。
-
manifest_file
- 清單檔案的路徑和檔案名稱
aws rekognition create-dataset --project-arn project_arn
\
--dataset-type type
\
--dataset-source '{ "GroundTruthManifest": { "S3Object": { "Bucket": "bucket
", "Name": "manifest_file
" } } }' \
--profile custom-labels-access
--tags '{"key1": "value1", "key2": "value2"}'
- Python
-
使用下列值建立資料集。請提供以下命令列參數:
-
project_arn
— 您要新增測試資料集ARN的專案。
-
dataset_type
— 您要建立的資料集類型 (train
或 test
)。
-
bucket
- 包含資料集之清單檔案的儲存貯體。
-
manifest_file
- 清單檔案的路徑和檔案名稱
#Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-custom-labels-developer-guide/blob/master/LICENSE-SAMPLECODE.)
import argparse
import logging
import time
import json
import boto3
from botocore.exceptions import ClientError
logger = logging.getLogger(__name__)
def create_dataset(rek_client, project_arn, dataset_type, bucket, manifest_file):
"""
Creates an Amazon Rekognition Custom Labels dataset.
:param rek_client: The Amazon Rekognition Custom Labels Boto3 client.
:param project_arn: The ARN of the project in which you want to create a dataset.
:param dataset_type: The type of the dataset that you want to create (train or test).
:param bucket: The S3 bucket that contains the manifest file.
:param manifest_file: The path and filename of the manifest file.
"""
try:
#Create the project
logger.info("Creating %s dataset for project %s",dataset_type, project_arn)
dataset_type = dataset_type.upper()
dataset_source = json.loads(
'{ "GroundTruthManifest": { "S3Object": { "Bucket": "'
+ bucket
+ '", "Name": "'
+ manifest_file
+ '" } } }'
)
response = rek_client.create_dataset(
ProjectArn=project_arn, DatasetType=dataset_type, DatasetSource=dataset_source
)
dataset_arn=response['DatasetArn']
logger.info("dataset ARN: %s",dataset_arn)
finished=False
while finished is False:
dataset=rek_client.describe_dataset(DatasetArn=dataset_arn)
status=dataset['DatasetDescription']['Status']
if status == "CREATE_IN_PROGRESS":
logger.info("Creating dataset: %s ",dataset_arn)
time.sleep(5)
continue
if status == "CREATE_COMPLETE":
logger.info("Dataset created: %s", dataset_arn)
finished=True
continue
if status == "CREATE_FAILED":
error_message = f"Dataset creation failed: {status} : {dataset_arn}"
logger.exception(error_message)
raise Exception (error_message)
error_message = f"Failed. Unexpected state for dataset creation: {status} : {dataset_arn}"
logger.exception(error_message)
raise Exception(error_message)
return dataset_arn
except ClientError as err:
logger.exception("Couldn't create dataset: %s",err.response['Error']['Message'])
raise
def add_arguments(parser):
"""
Adds command line arguments to the parser.
:param parser: The command line parser.
"""
parser.add_argument(
"project_arn", help="The ARN of the project in which you want to create the dataset."
)
parser.add_argument(
"dataset_type", help="The type of the dataset that you want to create (train or test)."
)
parser.add_argument(
"bucket", help="The S3 bucket that contains the manifest file."
)
parser.add_argument(
"manifest_file", help="The path and filename of the manifest file."
)
def main():
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
try:
#Get command line arguments.
parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
add_arguments(parser)
args = parser.parse_args()
print(f"Creating {args.dataset_type} dataset for project {args.project_arn}")
#Create the dataset.
session = boto3.Session(profile_name='custom-labels-access')
rekognition_client = session.client("rekognition")
dataset_arn=create_dataset(rekognition_client,
args.project_arn,
args.dataset_type,
args.bucket,
args.manifest_file)
print(f"Finished creating dataset: {dataset_arn}")
except ClientError as err:
logger.exception("Problem creating dataset: %s", err)
print(f"Problem creating dataset: {err}")
if __name__ == "__main__":
main()
- Java V2
-
使用下列值建立資料集。請提供以下命令列參數:
-
project_arn
— 您要新增測試資料集ARN的專案。
-
dataset_type
— 您要建立的資料集類型 (train
或 test
)。
-
bucket
- 包含資料集之清單檔案的儲存貯體。
-
manifest_file
- 清單檔案的路徑和檔案名稱
/*
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: Apache-2.0
*/
package com.example.rekognition;
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.rekognition.RekognitionClient;
import software.amazon.awssdk.services.rekognition.model.CreateDatasetRequest;
import software.amazon.awssdk.services.rekognition.model.CreateDatasetResponse;
import software.amazon.awssdk.services.rekognition.model.DatasetDescription;
import software.amazon.awssdk.services.rekognition.model.DatasetSource;
import software.amazon.awssdk.services.rekognition.model.DatasetStatus;
import software.amazon.awssdk.services.rekognition.model.DatasetType;
import software.amazon.awssdk.services.rekognition.model.DescribeDatasetRequest;
import software.amazon.awssdk.services.rekognition.model.DescribeDatasetResponse;
import software.amazon.awssdk.services.rekognition.model.GroundTruthManifest;
import software.amazon.awssdk.services.rekognition.model.RekognitionException;
import software.amazon.awssdk.services.rekognition.model.S3Object;
import java.util.logging.Level;
import java.util.logging.Logger;
public class CreateDatasetManifestFiles {
public static final Logger logger = Logger.getLogger(CreateDatasetManifestFiles.class.getName());
public static String createMyDataset(RekognitionClient rekClient, String projectArn, String datasetType,
String bucket, String name) throws Exception, RekognitionException {
try {
logger.log(Level.INFO, "Creating {0} dataset for project : {1} from s3://{2}/{3} ",
new Object[] { datasetType, projectArn, bucket, name });
DatasetType requestDatasetType = null;
switch (datasetType) {
case "train":
requestDatasetType = DatasetType.TRAIN;
break;
case "test":
requestDatasetType = DatasetType.TEST;
break;
default:
logger.log(Level.SEVERE, "Could not create dataset. Unrecognized dataset type: {0}", datasetType);
throw new Exception("Could not create dataset. Unrecognized dataset type: " + datasetType);
}
GroundTruthManifest groundTruthManifest = GroundTruthManifest.builder()
.s3Object(S3Object.builder().bucket(bucket).name(name).build()).build();
DatasetSource datasetSource = DatasetSource.builder().groundTruthManifest(groundTruthManifest).build();
CreateDatasetRequest createDatasetRequest = CreateDatasetRequest.builder().projectArn(projectArn)
.datasetType(requestDatasetType).datasetSource(datasetSource).build();
CreateDatasetResponse response = rekClient.createDataset(createDatasetRequest);
boolean created = false;
do {
DescribeDatasetRequest describeDatasetRequest = DescribeDatasetRequest.builder()
.datasetArn(response.datasetArn()).build();
DescribeDatasetResponse describeDatasetResponse = rekClient.describeDataset(describeDatasetRequest);
DatasetDescription datasetDescription = describeDatasetResponse.datasetDescription();
DatasetStatus status = datasetDescription.status();
logger.log(Level.INFO, "Creating dataset ARN: {0} ", response.datasetArn());
switch (status) {
case CREATE_COMPLETE:
logger.log(Level.INFO, "Dataset created");
created = true;
break;
case CREATE_IN_PROGRESS:
Thread.sleep(5000);
break;
case CREATE_FAILED:
String error = "Dataset creation failed: " + datasetDescription.statusAsString() + " "
+ datasetDescription.statusMessage() + " " + response.datasetArn();
logger.log(Level.SEVERE, error);
throw new Exception(error);
default:
String unexpectedError = "Unexpected creation state: " + datasetDescription.statusAsString() + " "
+ datasetDescription.statusMessage() + " " + response.datasetArn();
logger.log(Level.SEVERE, unexpectedError);
throw new Exception(unexpectedError);
}
} while (created == false);
return response.datasetArn();
} catch (RekognitionException e) {
logger.log(Level.SEVERE, "Could not create dataset: {0}", e.getMessage());
throw e;
}
}
public static void main(String[] args) {
String datasetType = null;
String bucket = null;
String name = null;
String projectArn = null;
String datasetArn = null;
final String USAGE = "\n" + "Usage: " + "<project_arn> <dataset_type> <dataset_arn>\n\n" + "Where:\n"
+ " project_arn - the ARN of the project that you want to add copy the datast to.\n\n"
+ " dataset_type - the type of the dataset that you want to create (train or test).\n\n"
+ " bucket - the S3 bucket that contains the manifest file.\n\n"
+ " name - the location and name of the manifest file within the bucket.\n\n";
if (args.length != 4) {
System.out.println(USAGE);
System.exit(1);
}
projectArn = args[0];
datasetType = args[1];
bucket = args[2];
name = args[3];
try {
// Get the Rekognition client
RekognitionClient rekClient = RekognitionClient.builder()
.credentialsProvider(ProfileCredentialsProvider.create("custom-labels-access"))
.region(Region.US_WEST_2)
.build();
// Create the dataset
datasetArn = createMyDataset(rekClient, projectArn, datasetType, bucket, name);
System.out.println(String.format("Created dataset: %s", datasetArn));
rekClient.close();
} catch (RekognitionException rekError) {
logger.log(Level.SEVERE, "Rekognition client error: {0}", rekError.getMessage());
System.exit(1);
} catch (Exception rekError) {
logger.log(Level.SEVERE, "Error: {0}", rekError.getMessage());
System.exit(1);
}
}
}
-
如果需要新增或變更標籤,請參閱 管理標籤 (SDK)。
-
請遵循 培訓模型 (SDK) 中的步驟訓練模型。
建立資料集要求
以下是 CreateDataset 操作請求的論壇:
{
"DatasetSource": {
"DatasetArn": "string",
"GroundTruthManifest": {
"S3Object": {
"Bucket": "string",
"Name": "string",
"Version": "string"
}
}
},
"DatasetType": "string",
"ProjectArn": "string",
"Tags": {
"string": "string"
}
}