教程:开始使用 Amazon A2I API - Amazon SageMaker

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

教程:开始使用 Amazon A2I API

本教程介绍了您可以用来开始使用 Amazon A2I 的API操作。

要使用 Jupyter 笔记本运行这些操作,请从中选择一个 Jupyter 笔记本使用 Amazon A2I 的使用场景和示例并使用它将 SageMaker 笔记本实例与 Amazon A2I Jupyter 笔记本配合使用来学习如何在笔记本实例中使用它。 SageMaker

要详细了解您可以在 Amazon A2I 上使用的API操作,请参阅。在 Amazon Augmented AI 中使用 API

创建私有工作团队

您可以创建一个私有工作团队并将自己添加为员工,这样您就可以预览 Amazon A2I。

如果您不熟悉 Amazon Cognito,我们建议您使用 SageMaker 控制台创建私人员工队伍,并将自己添加为私人员工。有关说明,请参阅 步骤 1:创建工作团队

如果您熟悉 Amazon Cognito,则可以按照以下说明使用创建私人工作团队。 SageMaker API创建工作团队后,请记下工作团队 ARN (WorkteamArn)。

要了解有关私有人力和其他可用配置的更多信息,请参阅 使用私有人力

创建私有人力

如果您还没有创建私人人力,则可以使用 Amazon Cognito 用户池创建。确保已将自己添加到此用户池中。你可以使用创建私人工作团队 AWS SDK for Python (Boto3) create_workforce函数。有关其他特定语言的信息SDKs,请参阅中的列表。CreateWorkforce

response = client.create_workforce( CognitoConfig={ "UserPool": "Pool_ID", "ClientId": "app-client-id" }, WorkforceName="workforce-name" )
创建私有工作团队

在你创建了一支私人劳动力队伍之后 AWS 要配置和启动人工循环的区域,你可以使用创建私人工作团队 AWS SDK for Python (Boto3) create_workteam函数。有关其他特定语言的信息SDKs,请参阅中的列表。CreateWorkteam

response = client.create_workteam( WorkteamName="work-team-name", WorkforceName= "workforce-name", MemberDefinitions=[ { "CognitoMemberDefinition": { "UserPool": "<aws-region>_ID", "UserGroup": "user-group", "ClientId": "app-client-id" }, } ] )

按ARN如下方式访问您的工作团队:

workteamArn = response["WorkteamArn"]
列出账户中的私有工作团队

如果您已经创建了私人工作团队,则可以列出给定工作团队中的所有工作团队 AWS 使用您账户中的区域 AWS SDK for Python (Boto3) list_workteams函数。有关其他特定语言的信息SDKs,请参阅中的列表。ListWorkteams

response = client.list_workteams()

如果您的账户中有多个工作团队,则可能需要使用 MaxResultsSortByNameContains 来筛选结果。

创建人工审核工作流

您可以使用 Amazon A2I CreateFlowDefinition 操作创建人工审核工作流。在创建人工审核工作流之前,您需要创建人工任务 UI。您可以使用 CreateHumanTaskUi 操作来创建。

如果您将亚马逊 A2I 与亚马逊 Textract 或 Amazon Rekognition 集成一起使用,则可以使用指定激活条件。JSON

创建人工任务 UI

如果您正在创建用于 Amazon Textract 或 Amazon Rekognition 集成的人工审核工作流,则需要使用和修改预先制作的工作人员任务模板。对于所有自定义集成,您可以使用自己的自定义工作人员任务模板。使用下表了解如何使用工作人员任务模板,为两个内置的集成创建人工任务 UI。使用自己的模板替换模板来自定义此请求。

Amazon Textract – Key-value pair extraction

要了解有关模板版本的更多信息,请参阅 Amazon Textract 的自定义模板示例

template = r""" <script src="https://assets.crowd.aws/crowd-html-elements.js"></script> {% capture s3_uri %}http://s3.amazonaws.com/{{ task.input.aiServiceRequest.document.s3Object.bucket }}/{{ task.input.aiServiceRequest.document.s3Object.name }}{% endcapture %} <crowd-form> <crowd-textract-analyze-document src="{{ s3_uri | grant_read_access }}" initial-value="{{ task.input.selectedAiServiceResponse.blocks }}" header="Review the key-value pairs listed on the right and correct them if they don"t match the following document." no-key-edit="" no-geometry-edit="" keys="{{ task.input.humanLoopContext.importantFormKeys }}" block-types='["KEY_VALUE_SET"]'> <short-instructions header="Instructions"> <p>Click on a key-value block to highlight the corresponding key-value pair in the document. </p><p><br></p> <p>If it is a valid key-value pair, review the content for the value. If the content is incorrect, correct it. </p><p><br></p> <p>The text of the value is incorrect, correct it.</p> <p><img src="https://assets.crowd.aws/images/a2i-console/correct-value-text.png"> </p><p><br></p> <p>A wrong value is identified, correct it.</p> <p><img src="https://assets.crowd.aws/images/a2i-console/correct-value.png"> </p><p><br></p> <p>If it is not a valid key-value relationship, choose No.</p> <p><img src="https://assets.crowd.aws/images/a2i-console/not-a-key-value-pair.png"> </p><p><br></p> <p>If you can’t find the key in the document, choose Key not found.</p> <p><img src="https://assets.crowd.aws/images/a2i-console/key-is-not-found.png"> </p><p><br></p> <p>If the content of a field is empty, choose Value is blank.</p> <p><img src="https://assets.crowd.aws/images/a2i-console/value-is-blank.png"> </p><p><br></p> <p><strong>Examples</strong></p> <p>Key and value are often displayed next or below to each other. </p><p><br></p> <p>Key and value displayed in one line.</p> <p><img src="https://assets.crowd.aws/images/a2i-console/sample-key-value-pair-1.png"> </p><p><br></p> <p>Key and value displayed in two lines.</p> <p><img src="https://assets.crowd.aws/images/a2i-console/sample-key-value-pair-2.png"> </p><p><br></p> <p>If the content of the value has multiple lines, enter all the text without line break. Include all value text even if it extends beyond the highlight box.</p> <p><img src="https://assets.crowd.aws/images/a2i-console/multiple-lines.png"></p> </short-instructions> <full-instructions header="Instructions"></full-instructions> </crowd-textract-analyze-document> </crowd-form> """
Amazon Rekognition – Image moderation

要了解有关模板版本的更多信息,请参阅 Amazon Rekognition 的自定义模板示例

template = r""" <script src="https://assets.crowd.aws/crowd-html-elements.js"></script> {% capture s3_uri %}http://s3.amazonaws.com/{{ task.input.aiServiceRequest.image.s3Object.bucket }}/{{ task.input.aiServiceRequest.image.s3Object.name }}{% endcapture %} <crowd-form> <crowd-rekognition-detect-moderation-labels categories='[ {% for label in task.input.selectedAiServiceResponse.moderationLabels %} { name: "{{ label.name }}", parentName: "{{ label.parentName }}", }, {% endfor %} ]' src="{{ s3_uri | grant_read_access }}" header="Review the image and choose all applicable categories." > <short-instructions header="Instructions"> <style> .instructions { white-space: pre-wrap; } </style> <p class="instructions">Review the image and choose all applicable categories. If no categories apply, choose None. <b>Nudity</b> Visuals depicting nude male or female person or persons <b>Partial Nudity</b> Visuals depicting covered up nudity, for example using hands or pose <b>Revealing Clothes</b> Visuals depicting revealing clothes and poses <b>Physical Violence</b> Visuals depicting violent physical assault, such as kicking or punching <b>Weapon Violence</b> Visuals depicting violence using weapons like firearms or blades, such as shooting <b>Weapons</b> Visuals depicting weapons like firearms and blades </short-instructions> <full-instructions header="Instructions"></full-instructions> </crowd-rekognition-detect-moderation-labels> </crowd-form>"""
Custom Integration

以下是可以在自定义集成中使用的示例模板。该笔记本中使用了此模板,演示与 Amazon Comprehend 的自定义集成。

template = r""" <script src="https://assets.crowd.aws/crowd-html-elements.js"></script> <crowd-form> <crowd-classifier name="sentiment" categories='["Positive", "Negative", "Neutral", "Mixed"]' initial-value="{{ task.input.initialValue }}" header="What sentiment does this text convey?" > <classification-target> {{ task.input.taskObject }} </classification-target> <full-instructions header="Sentiment Analysis Instructions"> <p><strong>Positive</strong> sentiment include: joy, excitement, delight</p> <p><strong>Negative</strong> sentiment include: anger, sarcasm, anxiety</p> <p><strong>Neutral</strong>: neither positive or negative, such as stating a fact</p> <p><strong>Mixed</strong>: when the sentiment is mixed</p> </full-instructions> <short-instructions> Choose the primary sentiment that is expressed by the text. </short-instructions> </crowd-classifier> </crowd-form> """

使用上面指定的模板,您可以使用创建模板 AWS SDK for Python (Boto3) create_human_task_ui函数。有关其他特定语言的信息SDKs,请参阅中的列表。CreateHumanTaskUi

response = client.create_human_task_ui( HumanTaskUiName="human-task-ui-name", UiTemplate={ "Content": template } )

此响应元素包含人工任务 UI ARN。如下所示保存此内容:

humanTaskUiArn = response["HumanTaskUiArn"]

创建JSON以指定激活条件

对于 Amazon Textract 和 Amazon Rekognition 的内置集成,您可以将激活条件保存在对象中并在请求中JSON使用它。CreateFlowDefinition

接下来,选择一个选项卡以查看可用于这些内置集成的示例激活条件。有关激活条件选项的其他信息,请参阅Amazon Augmented AI 中用于人工循环激活条件的 JSON 架构

Amazon Textract – Key-value pair extraction

此示例为文档中的特定键(例如 Mail address)指定条件。如果 Amazon Textract 的置信度在此处设定的阈值之外,则会将文档发送给人员进行审核,并向工作人员提示引发人工循环的特定键。

import json humanLoopActivationConditions = json.dumps( { "Conditions": [ { "Or": [ { "ConditionType": "ImportantFormKeyConfidenceCheck", "ConditionParameters": { "ImportantFormKey": "Mail address", "ImportantFormKeyAliases": ["Mail Address:","Mail address:", "Mailing Add:","Mailing Addresses"], "KeyValueBlockConfidenceLessThan": 100, "WordBlockConfidenceLessThan": 100 } }, { "ConditionType": "MissingImportantFormKey", "ConditionParameters": { "ImportantFormKey": "Mail address", "ImportantFormKeyAliases": ["Mail Address:","Mail address:","Mailing Add:","Mailing Addresses"] } }, { "ConditionType": "ImportantFormKeyConfidenceCheck", "ConditionParameters": { "ImportantFormKey": "Phone Number", "ImportantFormKeyAliases": ["Phone number:", "Phone No.:", "Number:"], "KeyValueBlockConfidenceLessThan": 100, "WordBlockConfidenceLessThan": 100 } }, { "ConditionType": "ImportantFormKeyConfidenceCheck", "ConditionParameters": { "ImportantFormKey": "*", "KeyValueBlockConfidenceLessThan": 100, "WordBlockConfidenceLessThan": 100 } }, { "ConditionType": "ImportantFormKeyConfidenceCheck", "ConditionParameters": { "ImportantFormKey": "*", "KeyValueBlockConfidenceGreaterThan": 0, "WordBlockConfidenceGreaterThan": 0 } } ] } ] } )
Amazon Rekognition – Image moderation

此处使用的人工循环激活条件针对 Amazon Rekognition 内容审核定制;它们基于 SuggestiveFemale Swimwear Or Underwear 审核标签的置信度阈值。

import json humanLoopActivationConditions = json.dumps( { "Conditions": [ { "Or": [ { "ConditionType": "ModerationLabelConfidenceCheck", "ConditionParameters": { "ModerationLabelName": "Suggestive", "ConfidenceLessThan": 98 } }, { "ConditionType": "ModerationLabelConfidenceCheck", "ConditionParameters": { "ModerationLabelName": "Female Swimwear Or Underwear", "ConfidenceGreaterThan": 98 } } ] } ] } )

创建人工审核工作流

本节给出了一个示例 CreateFlowDefinition AWS SDK for Python (Boto3) 使用前几节中创建的资源进行请求。有关其他特定语言的信息SDKs,请参阅中的列表。CreateFlowDefinition使用下表中的选项卡,查看为 Amazon Textract 和 Amazon Rekognition 内置集成创建人工审核工作流的请求。

Amazon Textract – Key-value pair extraction

如果您使用与 Amazon Textract 的内置集成,则必须在 HumanLoopRequestSource 中为 "AwsManagedHumanLoopRequestSource" 指定 "AWS/Textract/AnalyzeDocument/Forms/V1"

response = client.create_flow_definition( FlowDefinitionName="human-review-workflow-name", HumanLoopRequestSource={ "AwsManagedHumanLoopRequestSource": "AWS/Textract/AnalyzeDocument/Forms/V1" }, HumanLoopActivationConfig={ "HumanLoopActivationConditionsConfig": { "HumanLoopActivationConditions": humanLoopActivationConditions } }, HumanLoopConfig={ "WorkteamArn": workteamArn, "HumanTaskUiArn": humanTaskUiArn, "TaskTitle": "Document entry review", "TaskDescription": "Review the document and instructions. Complete the task", "TaskCount": 1, "TaskAvailabilityLifetimeInSeconds": 43200, "TaskTimeLimitInSeconds": 3600, "TaskKeywords": [ "document review", ], }, OutputConfig={ "S3OutputPath": "s3://amzn-s3-demo-bucket/prefix/", }, RoleArn="arn:aws:iam::<account-number>:role/<role-name>", Tags=[ { "Key": "string", "Value": "string" }, ] )
Amazon Rekognition – Image moderation

如果您使用与 Amazon Rekognition 的内置集成,则必须在 HumanLoopRequestSource 中为 "AwsManagedHumanLoopRequestSource" 指定 "AWS/Rekognition/DetectModerationLabels/Image/V3"

response = client.create_flow_definition( FlowDefinitionName="human-review-workflow-name", HumanLoopRequestSource={ "AwsManagedHumanLoopRequestSource": "AWS/Rekognition/DetectModerationLabels/Image/V3" }, HumanLoopActivationConfig={ "HumanLoopActivationConditionsConfig": { "HumanLoopActivationConditions": humanLoopActivationConditions } }, HumanLoopConfig={ "WorkteamArn": workteamArn, "HumanTaskUiArn": humanTaskUiArn, "TaskTitle": "Image content moderation", "TaskDescription": "Review the image and instructions. Complete the task", "TaskCount": 1, "TaskAvailabilityLifetimeInSeconds": 43200, "TaskTimeLimitInSeconds": 3600, "TaskKeywords": [ "content moderation", ], }, OutputConfig={ "S3OutputPath": "s3://amzn-s3-demo-bucket/prefix/", }, RoleArn="arn:aws:iam::<account-number>:role/<role-name>", Tags=[ { "Key": "string", "Value": "string" }, ] )
Custom Integration

如果您使用自定义集成,请排除以下参数:HumanLoopRequestSourceHumanLoopActivationConfig

response = client.create_flow_definition( FlowDefinitionName="human-review-workflow-name", HumanLoopConfig={ "WorkteamArn": workteamArn, "HumanTaskUiArn": humanTaskUiArn, "TaskTitle": "Image content moderation", "TaskDescription": "Review the image and instructions. Complete the task", "TaskCount": 1, "TaskAvailabilityLifetimeInSeconds": 43200, "TaskTimeLimitInSeconds": 3600, "TaskKeywords": [ "content moderation", ], }, OutputConfig={ "S3OutputPath": "s3://amzn-s3-demo-bucket/prefix/", }, RoleArn="arn:aws:iam::<account-number>:role/<role-name>", Tags=[ { "Key": "string", "Value": "string" }, ] )

创建人工审核工作流程后,您可以ARN从响应中检索流程定义:

humanReviewWorkflowArn = response["FlowDefinitionArn"]

创建人工循环

您用来启动人机循环的API操作取决于您使用的 Amazon A2I 集成。

在下表中选择您的任务类型,使用 Amazon Textract 和 Amazon Rekognition 查看请求示例 AWS SDK for Python (Boto3).

Amazon Textract – Key-value pair extraction

以下示例使用 AWS SDK for Python (Boto3) 打电话给 us-w analyze_document est-2。使用您的资源替换斜体红色文本。如果您使用的是 Amazon Mechanical Turk 人力,请包括 DataAttributes 参数。有关更多信息,请参阅中的 an alyze_document 文档 AWS SDK for Python (Boto) API参考

response = client.analyze_document( Document={"S3Object": {"Bucket": "amzn-s3-demo-bucket", "Name": "document-name.pdf"}, HumanLoopConfig={ "FlowDefinitionArn":"arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name", "HumanLoopName":"human-loop-name", "DataAttributes" : {ContentClassifiers:["FreeOfPersonallyIdentifiableInformation"|"FreeOfAdultContent"]} } FeatureTypes=["FORMS"] )

只有当 Amazon Textract 文档分析任务的置信度满足您在人工审核工作流中指定的激活条件时,才会创建人工循环。您可以查看 response 元素来确定是否创建了人工循环。要查看此响应中包含的所有内容,请参阅 HumanLoopActivationOutput

if "HumanLoopArn" in analyzeDocumentResponse["HumanLoopActivationOutput"]: # A human loop has been started! print(f"A human loop has been started with ARN: {analyzeDocumentResponse["HumanLoopActivationOutput"]["HumanLoopArn"]}"
Amazon Rekognition – Image moderation

以下示例使用 AWS SDK for Python (Boto3) 打电话给 us-w detect_moderation_labels est-2。使用您的资源替换斜体红色文本。如果您使用的是 Amazon Mechanical Turk 人力,请包括 DataAttributes 参数。有关更多信息,请参阅 d etect_moderation_labels 文档 AWS SDK for Python (Boto) API参考

response = client.detect_moderation_labels( Image={"S3Object":{"Bucket": "amzn-s3-demo-bucket", "Name": "image-name.png"}}, HumanLoopConfig={ "FlowDefinitionArn":"arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name", "HumanLoopName":"human-loop-name", "DataAttributes":{ContentClassifiers:["FreeOfPersonallyIdentifiableInformation"|"FreeOfAdultContent"]} } )

只有当 Amazon Rekognition 图像监管任务的置信度满足您在人工审核工作流中指定的激活条件时,才会创建人工循环。您可以查看 response 元素来确定是否创建了人工循环。要查看此响应中包含的所有内容,请参阅 HumanLoopActivationOutput

if "HumanLoopArn" in response["HumanLoopActivationOutput"]: # A human loop has been started! print(f"A human loop has been started with ARN: {response["HumanLoopActivationOutput"]["HumanLoopArn"]}")
Custom Integration

以下示例使用 AWS SDK for Python (Boto3) 打电话给 us-w start_human_loop est-2。使用您的资源替换斜体红色文本。如果您使用的是 Amazon Mechanical Turk 人力,请包括 DataAttributes 参数。有关更多信息,请参阅中的 start_human_loop 文档 AWS SDK for Python (Boto) API参考

response = client.start_human_loop( HumanLoopName= "human-loop-name", FlowDefinitionArn= "arn:aws:sagemaker:us-west-2:111122223333:flow-definition/flow-definition-name", HumanLoopInput={"InputContent": inputContentJson}, DataAttributes={"ContentClassifiers":["FreeOfPersonallyIdentifiableInformation"|"FreeOfAdultContent"]} )

此示例将输入内容存储在变量中 inputContentJson。 假设输入内容包含两个元素:文本简介和情绪(例如PositiveNegative、或Neutral),其格式如下:

inputContent = { "initialValue": sentiment, "taskObject": blurb }

initialValuetaskObject 必须与工作人员任务模板的 liquid 元素中使用的键相对应。请参阅 创建人工任务 UI 中的自定义模板以查看示例。

要创建 inputContentJson,请执行以下操作:

import json inputContentJson = json.dumps(inputContent)

每次调用 start_human_loop 时会启动人工循环。要检查人工循环的状态,请使用 describe_human_loop

human_loop_info = a2i.describe_human_loop(HumanLoopName="human_loop_name") print(f"HumanLoop Status: {resp["HumanLoopStatus"]}") print(f"HumanLoop Output Destination: {resp["HumanLoopOutput"]}")