主题建模的异步分析 - Amazon Comprehend

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

主题建模的异步分析

要确定文档集中的主题,请使用StartTopicsDetectionJob以启动异步作业。您可以监视用英语或西班牙语编写的文档中的主题。

开始之前

在开始之前,请确保您具有:

  • 输入与输出存储桶— 确定要用于输入和输出的 Amazon S3 存储桶。存储桶必须与您要调用的 API 位于同一区域。

  • IAM 服务角色—您必须拥有有权访问您的输入和输出存储桶的 IAM 服务角色。有关更多信息,请参阅 异步操作所需的基于角色的权限

使用主题建模AWS Command Line Interface

以下示例演示如何使用StartTopicsDetectionJoboperationAWS CLI

此示例的格式适用于 Unix、Linux 和 macOS。对于 Windows,请将每行末尾的反斜杠 (\) Unix 行继续符替换为脱字号 (^)。

aws comprehend start-topics-detection-job \ --number-of-topics topics to return \ --job-name "job name" \ --region region \ --cli-input-json file://path to JSON input file

对于cli-input-json参数提供包含请求数据的 JSON 文件的路径,如以下示例所示。

{ "InputDataConfig": { "S3Uri": "s3://input bucket/input path", "InputFormat": "ONE_DOC_PER_FILE" }, "OutputDataConfig": { "S3Uri": "s3://output bucket/output path" }, "DataAccessRoleArn": "arn:aws:iam::account ID:role/data access role" }

如果启动主题检测作业的请求成功,您将收到以下响应:

{ "JobStatus": "SUBMITTED", "JobId": "job ID" }

使用ListTopicsDetectionJobsoperation. 操作以查看您已提交的主题检测任务的检测任务的作业的列表。该列表包含有关您使用的输入和输出位置以及每个检测作业的状态的信息。此示例的格式适用于 Unix、Linux 和 macOS。对于 Windows,请将每行末尾的反斜杠 (\) Unix 行继续符替换为脱字号 (^)。

aws comprehend list-topics-detection-jobs \-- region

您将获得与以下内容类似 JSON:

{ "TopicsDetectionJobPropertiesList": [ { "InputDataConfig": { "S3Uri": "s3://input bucket/input path", "InputFormat": "ONE_DOC_PER_LINE" }, "NumberOfTopics": topics to return, "JobId": "job ID", "JobStatus": "COMPLETED", "JobName": "job name", "SubmitTime": timestamp, "OutputDataConfig": { "S3Uri": "s3://output bucket/output path" }, "EndTime": timestamp }, { "InputDataConfig": { "S3Uri": "s3://input bucket/input path", "InputFormat": "ONE_DOC_PER_LINE" }, "NumberOfTopics": topics to return, "JobId": "job ID", "JobStatus": "RUNNING", "JobName": "job name", "SubmitTime": timestamp, "OutputDataConfig": { "S3Uri": "s3://output bucket/output path" } } ] }

您可以使用DescribeTopicsDetectionJoboperation. 此示例的格式适用于 Unix、Linux 和 macOS。对于 Windows,请将每行末尾的反斜杠 (\) Unix 行继续符替换为脱字号 (^)。

aws comprehend describe-topics-detection-job --job-id job ID

您将获得以下 JSON 作为响应:

{ "TopicsDetectionJobProperties": { "InputDataConfig": { "S3Uri": "s3://input bucket/input path", "InputFormat": "ONE_DOC_PER_LINE" }, "NumberOfTopics": topics to return, "JobId": "job ID", "JobStatus": "COMPLETED", "JobName": "job name", "SubmitTime": timestamp, "OutputDataConfig": { "S3Uri": "s3://output bucket/ouput path" }, "EndTime": timestamp } }

使用主题建模AWS SDK for Java

以下 Java 程序检测文档集合中的主题。它使用StartTopicsDetectionJob操作开始检测主题。接下来,它使用DescribeTopicsDetectionJob操作以检查主题检测的状态。最后,它调用ListTopicsDetectionJobs以显示为该账户提交的所有作业的列表。

import com.amazonaws.auth.AWSCredentialsProvider; import com.amazonaws.auth.DefaultAWSCredentialsProviderChain; import com.amazonaws.client.builder.AwsClientBuilder; import com.amazonaws.services.comprehend.AmazonComprehend; import com.amazonaws.services.comprehend.AmazonComprehendClientBuilder; import com.amazonaws.services.comprehend.model.DescribeTopicsDetectionJobRequest; import com.amazonaws.services.comprehend.model.DescribeTopicsDetectionJobResult; import com.amazonaws.services.comprehend.model.InputDataConfig; import com.amazonaws.services.comprehend.model.InputFormat; import com.amazonaws.services.comprehend.model.ListTopicsDetectionJobsRequest; import com.amazonaws.services.comprehend.model.ListTopicsDetectionJobsResult; import com.amazonaws.services.comprehend.model.StartTopicsDetectionJobRequest; import com.amazonaws.services.comprehend.model.StartTopicsDetectionJobResult; public class App { public static void main( String[] args ) { // Create credentials using a provider chain. For more information, see // https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html AWSCredentialsProvider awsCreds = DefaultAWSCredentialsProviderChain.getInstance(); AmazonComprehend comprehendClient = AmazonComprehendClientBuilder.standard() .withCredentials(awsCreds) .withRegion("region") .build(); final String inputS3Uri = "s3://input bucket/input path"; final InputFormat inputDocFormat = InputFormat.ONE_DOC_PER_FILE; final String outputS3Uri = "s3://output bucket/output path"; final String dataAccessRoleArn = "arn:aws:iam::account ID:role/data access role"; final int numberOfTopics = 10; final StartTopicsDetectionJobRequest startTopicsDetectionJobRequest = new StartTopicsDetectionJobRequest() .withInputDataConfig(new InputDataConfig() .withS3Uri(inputS3Uri) .withInputFormat(inputDocFormat)) .withOutputDataConfig(new OutputDataConfig() .withS3Uri(outputS3Uri)) .withDataAccessRoleArn(dataAccessRoleArn) .withNumberOfTopics(numberOfTopics); final StartTopicsDetectionJobResult startTopicsDetectionJobResult = comprehendClient.startTopicsDetectionJob(startTopicsDetectionJobRequest); final String jobId = startTopicsDetectionJobResult.getJobId(); System.out.println("JobId: " + jobId); final DescribeTopicsDetectionJobRequest describeTopicsDetectionJobRequest = new DescribeTopicsDetectionJobRequest() .withJobId(jobId); final DescribeTopicsDetectionJobResult describeTopicsDetectionJobResult = comprehendClient.describeTopicsDetectionJob(describeTopicsDetectionJobRequest); System.out.println("describeTopicsDetectionJobResult: " + describeTopicsDetectionJobResult); ListTopicsDetectionJobsResult listTopicsDetectionJobsResult = comprehendClient.listTopicsDetectionJobs(new ListTopicsDetectionJobsRequest()); System.out.println("listTopicsDetectionJobsResult: " + listTopicsDetectionJobsResult); } }

使用主题建模AWS SDK for Python (Boto)

以下 Python 程序检测文档集合中的主题。它使用StartTopicsDetectionJob操作开始检测主题。接下来,它使用DescribeTopicsDetectionJob操作以检查主题检测的状态。最后,它调用ListTopicsDetectionJobs以显示为该账户提交的所有作业的列表。

import boto3 import json from bson import json_util comprehend = boto3.client(service_name='comprehend', region_name='region') input_s3_url = "s3://input bucket/input path" input_doc_format = "ONE_DOC_PER_FILE" output_s3_url = "s3://output bucket/output path" data_access_role_arn = "arn:aws:iam::account ID:role/data access role" number_of_topics = 10 input_data_config = {"S3Uri": input_s3_url, "InputFormat": input_doc_format} output_data_config = {"S3Uri": output_s3_url} start_topics_detection_job_result = comprehend.start_topics_detection_job(NumberOfTopics=number_of_topics, InputDataConfig=input_data_config, OutputDataConfig=output_data_config, DataAccessRoleArn=data_access_role_arn) print('start_topics_detection_job_result: ' + json.dumps(start_topics_detection_job_result)) job_id = start_topics_detection_job_result["JobId"] print('job_id: ' + job_id) describe_topics_detection_job_result = comprehend.describe_topics_detection_job(JobId=job_id) print('describe_topics_detection_job_result: ' + json.dumps(describe_topics_detection_job_result, default=json_util.default)) list_topics_detection_jobs_result = comprehend.list_topics_detection_jobs() print('list_topics_detection_jobs_result: ' + json.dumps(list_topics_detection_jobs_result, default=json_util.default))

使用主题建模AWS SDK for .NET

以下 C# 程序检测文档集合中的主题。它使用StartTopicsDetectionJob操作开始检测主题。接下来,它使用DescribeTopicsDetectionJob操作以检查主题检测的状态。最后,它调用ListTopicsDetectionJobs以显示为该账户提交的所有作业的列表。

本节中的 .NET 示例使用AWS SDK for .NET. 您可以使用AWS Toolkit for Visual Studio使用 .NET 开发 AWS 应用程序。它包括有用模板和 AWS Explorer,用于部署应用程序和管理服务。有关 .NET 开发人员对于 AWS 的观点,请参阅适用AWS .NET 开发人员的 AW.

using System; using Amazon.Comprehend; using Amazon.Comprehend.Model; namespace Comprehend { class Program { // Helper method for printing properties static private void PrintJobProperties(TopicsDetectionJobProperties props) { Console.WriteLine("JobId: {0}, JobName: {1}, JobStatus: {2}, NumberOfTopics: {3}\nInputS3Uri: {4}, InputFormat: {5}, OutputS3Uri: {6}", props.JobId, props.JobName, props.JobStatus, props.NumberOfTopics, props.InputDataConfig.S3Uri, props.InputDataConfig.InputFormat, props.OutputDataConfig.S3Uri); } static void Main(string[] args) { String text = "It is raining today in Seattle"; AmazonComprehendClient comprehendClient = new AmazonComprehendClient(Amazon.RegionEndpoint.USWest2); String inputS3Uri = "s3://input bucket/input path"; InputFormat inputDocFormat = InputFormat.ONE_DOC_PER_FILE; String outputS3Uri = "s3://output bucket/output path"; String dataAccessRoleArn = "arn:aws:iam::account ID:role/data access role"; int numberOfTopics = 10; StartTopicsDetectionJobRequest startTopicsDetectionJobRequest = new StartTopicsDetectionJobRequest() { InputDataConfig = new InputDataConfig() { S3Uri = inputS3Uri, InputFormat = inputDocFormat }, OutputDataConfig = new OutputDataConfig() { S3Uri = outputS3Uri }, DataAccessRoleArn = dataAccessRoleArn, NumberOfTopics = numberOfTopics }; StartTopicsDetectionJobResponse startTopicsDetectionJobResponse = comprehendClient.StartTopicsDetectionJob(startTopicsDetectionJobRequest); String jobId = startTopicsDetectionJobResponse.JobId; Console.WriteLine("JobId: " + jobId); DescribeTopicsDetectionJobRequest describeTopicsDetectionJobRequest = new DescribeTopicsDetectionJobRequest() { JobId = jobId }; DescribeTopicsDetectionJobResponse describeTopicsDetectionJobResponse = comprehendClient.DescribeTopicsDetectionJob(describeTopicsDetectionJobRequest); PrintJobProperties(describeTopicsDetectionJobResponse.TopicsDetectionJobProperties); ListTopicsDetectionJobsResponse listTopicsDetectionJobsResponse = comprehendClient.ListTopicsDetectionJobs(new ListTopicsDetectionJobsRequest()); foreach (TopicsDetectionJobProperties props in listTopicsDetectionJobsResponse.TopicsDetectionJobPropertiesList) PrintJobProperties(props); } } }