训练和运行自定义识别器 (API) - Amazon Comprehend

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

训练和运行自定义识别器 (API)

要在文档中创建自定义实体,请使用 Amazon ComprehendCreateEntityRecognizer创建实体识别器。要标识这些自定义实体,请使用StartEntitiesDetectionJoboperation.

使用创建和检测自定义实体AWS Command Line Interface

以下示例展示如何使用CreateEntityRecognizeroperation.StartEntitiesDetectionJob操作和其他关联的 APIAWS CLI.

此示例的格式适用于 Unix、Linux 和 macOS。对于 Windows,请将每行末尾的反斜杠 (\) Unix 行继续符替换为脱字号 (^)。

使用创建自定义实体识别器CreateEntityRecognizeroperation.

aws comprehend create-entity-recognizer \ --language-code en \ --recognizer-name test-6 \ --data-access-role-arn "arn:aws:iam::account number:role/service-role/AmazonComprehendServiceRole-role" \ --input-data-config "EntityTypes=[{Type=PERSON}],Documents={S3Uri=s3://Bucket Name/Bucket Path/documents}, Annotations={S3Uri=s3://Bucket Name/Bucket Path/annotations}" \ --region region

使用列出区域中的所有实体识别器ListEntityRecognizersoperation.

aws comprehend list-entity-recognizers \ --region region

使用检查自定义实体识别器的Job 状态DescribeEntityRecognizeroperation.

aws comprehend describe-entity-recognizer \ --entity-recognizer-arn arn:aws:comprehend:region:account number:entity-recognizer/test-6 \ --region region

使用启动自定义实体识别作业StartEntitiesDetectionJoboperation.

aws comprehend start-entities-detection-job \ --entity-recognizer-arn "arn:aws:comprehend:region:account number:entity-recognizer/test-6" \ --job-name infer-1 \ --data-access-role-arn "arn:aws:iam::account number:role/service-role/AmazonComprehendServiceRole-role" \ --language-code en \ --input-data-config "S3Uri=s3://Bucket Name/Bucket Path" \ --output-data-config "S3Uri=s3://Bucket Name/Bucket Path/" \ --region region

使用检测自定义实体AWS SDK for Java

此示例创建自定义实体识别器,训练模型,然后使用 Java 在实体识别器作业中运行它

import com.amazonaws.auth.AWSCredentialsProvider; import com.amazonaws.auth.DefaultAWSCredentialsProviderChain; import com.amazonaws.services.comprehend.AmazonComprehend; import com.amazonaws.services.comprehend.AmazonComprehendClientBuilder; import com.amazonaws.services.comprehend.model.CreateEntityRecognizerRequest; import com.amazonaws.services.comprehend.model.CreateEntityRecognizerResult; import com.amazonaws.services.comprehend.model.DescribeEntityRecognizerRequest; import com.amazonaws.services.comprehend.model.DescribeEntityRecognizerResult; import com.amazonaws.services.comprehend.model.EntityRecognizerAnnotations; import com.amazonaws.services.comprehend.model.EntityRecognizerDocuments; import com.amazonaws.services.comprehend.model.EntityRecognizerInputDataConfig; import com.amazonaws.services.comprehend.model.EntityTypesListItem; import com.amazonaws.services.comprehend.model.InputDataConfig; import com.amazonaws.services.comprehend.model.LanguageCode; import com.amazonaws.services.comprehend.model.OutputDataConfig; import com.amazonaws.services.comprehend.model.StartEntitiesDetectionJobRequest; import com.amazonaws.services.comprehend.model.StartEntitiesDetectionJobResult; public class CustomEntityRecognizerDemo { public static void main(String[] args) { // Create credentials using a provider chain. For more information, see // https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html AWSCredentialsProvider awsCreds = DefaultAWSCredentialsProviderChain.getInstance(); AmazonComprehend comprehendClient = AmazonComprehendClientBuilder.standard() .withCredentials(awsCreds) .withRegion("region") .build(); final String dataAccessRoleArn = "arn:aws:iam::account number:role/service-role/AmazonComprehendServiceRole-role"; final CreateEntityRecognizerRequest createEntityRecognizerRequest = new CreateEntityRecognizerRequest() .withRecognizerName("recognizer name") .withDataAccessRoleArn(dataAccessRoleArn) .withLanguageCode(LanguageCode.En) .withInputDataConfig(new EntityRecognizerInputDataConfig() .withEntityTypes(new EntityTypesListItem().withType("PERSON")) .withDocuments(new EntityRecognizerDocuments() .withS3Uri("s3://Bucket Name/Bucket Path/documents")) .withAnnotations(new EntityRecognizerAnnotations() .withS3Uri("s3://Bucket Name/Bucket Path/annotations"))); final CreateEntityRecognizerResult createEntityRecognizerResult = comprehendClient.createEntityRecognizer(createEntityRecognizerRequest); final String entityRecognizerArn = createEntityRecognizerResult.getEntityRecognizerArn(); System.out.println("Entity Recognizer ARN: " + entityRecognizerArn); DescribeEntityRecognizerRequest describeEntityRecognizerRequest = new DescribeEntityRecognizerRequest() .withEntityRecognizerArn(entityRecognizerArn); final DescribeEntityRecognizerResult describeEntityRecognizerResult = comprehendClient.describeEntityRecognizer(describeEntityRecognizerRequest); System.out.println("describeEntityRecognizerResult: " + describeEntityRecognizerResult); if ("TRAINED".equals(describeEntityRecognizerResult.getEntityRecognizerProperties().getStatus())) { // After model gets trained, launch an job to extract entities. final StartEntitiesDetectionJobRequest startEntitiesDetectionJobRequest = new StartEntitiesDetectionJobRequest() .withJobName("Inference Job Name") .withEntityRecognizerArn(entityRecognizerArn) .withDataAccessRoleArn(dataAccessRoleArn) .withLanguageCode(LanguageCode.En) .withInputDataConfig(new InputDataConfig() .withS3Uri("s3://Bucket Name/Bucket Path")) .withOutputDataConfig(new OutputDataConfig() .withS3Uri("s3://Bucket Name/Bucket Path/")); final StartEntitiesDetectionJobResult startEntitiesDetectionJobResult = comprehendClient.startEntitiesDetectionJob(startEntitiesDetectionJobRequest); System.out.println("startEntitiesDetectionJobResult: " + startEntitiesDetectionJobResult); } } }

使用适用于 Python 的 AWS 开发工具包 (Boto3) 检测自定义实体

实例化 Boto3 SDK:

import boto3 import uuid comprehend = boto3.client("comprehend", region_name="region")

Create entity entity re

response = comprehend.create_entity_recognizer( RecognizerName="Recognizer-Name-Goes-Here-{}".format(str(uuid.uuid4())), LanguageCode="en", DataAccessRoleArn="Role ARN", InputDataConfig={ "EntityTypes": [ { "Type": "ENTITY_TYPE" } ], "Documents": { "S3Uri": "s3://Bucket Name/Bucket Path/documents" }, "Annotations": { "S3Uri": "s3://Bucket Name/Bucket Path/annotations" } } ) recognizer_arn = response["EntityRecognizerArn"]

列出所有识别器:

response = comprehend.list_entity_recognizers()

等待识别器达到 TRAINED 状态:

while True: response = comprehend.describe_entity_recognizer( EntityRecognizerArn=recognizer_arn ) status = response["EntityRecognizerProperties"]["Status"] if "IN_ERROR" == status: sys.exit(1) if "TRAINED" == status: break time.sleep(10)

启动实体检测作业:

response = comprehend.start_entities_detection_job( EntityRecognizerArn=recognizer_arn, JobName="Detection-Job-Name-{}".format(str(uuid.uuid4())), LanguageCode="en", DataAccessRoleArn="Role ARN", InputDataConfig={ "InputFormat": "ONE_DOC_PER_LINE", "S3Uri": "s3://Bucket Name/Bucket Path/documents" }, OutputDataConfig={ "S3Uri": "s3://Bucket Name/Bucket Path/output" } )