异步分析作业的输出 - Amazon Comprehend

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

异步分析作业的输出

分析作业完成后,它将结果存储到您在请求中指定的 S3 存储桶中。

文本输入的输出

无论哪种格式的文本输入文档(多类或多标签),作业输出都由一个名为 output.tar.gz 的文件组成。它是一个压缩的存档文件,其中包含一个带有输出的文本文件。

多类输出

当您使用在多类模式下训练的分类器时,结果会显示 classes。这些类中的每一个 classes 都是在训练分类器时用来创建类别集的类。

有关这些输出字段的更多详细信息,请参阅ClassifyDocumentAmazon Comprehend API 参考》。

以下示例使用以下互斥类。

DOCUMENTARY SCIENCE_FICTION ROMANTIC_COMEDY SERIOUS_DRAMA OTHER

如果您的输入数据格式为每行一个文档,则输出文件中的每行包含输入中的一行。每行都包括文件名、输入行的从零开始的行号以及文档中找到的一个或多个类。以 Amazon Comprehend 对单个实例正确分类的置信度作为结束。

例如:

{"File": "file1.txt", "Line": "0", "Classes": [{"Name": "Documentary", "Score": 0.8642}, {"Name": "Other", "Score": 0.0381}, {"Name": "Serious_Drama", "Score": 0.0372}]} {"File": "file1.txt", "Line": "1", "Classes": [{"Name": "Science_Fiction", "Score": 0.5}, {"Name": "Science_Fiction", "Score": 0.0381}, {"Name": "Science_Fiction", "Score": 0.0372}]} {"File": "file2.txt", "Line": "2", "Classes": [{"Name": "Documentary", "Score": 0.1}, {"Name": "Documentary", "Score": 0.0381}, {"Name": "Documentary", "Score": 0.0372}]} {"File": "file2.txt", "Line": "3", "Classes": [{"Name": "Serious_Drama", "Score": 0.3141}, {"Name": "Other", "Score": 0.0381}, {"Name": "Other", "Score": 0.0372}]}

如果您的输入数据格式为每个文件一个文档,则输出文件包含每个文档一行。每行都有文件名和文档中找到的一个或多个类。以 Amazon Comprehend 对单个实例正确分类的置信度作为结束。

例如:

{"File": "file0.txt", "Classes": [{"Name": "Documentary", "Score": 0.8642}, {"Name": "Other", "Score": 0.0381}, {"Name": "Serious_Drama", "Score": 0.0372}]} {"File": "file1.txt", "Classes": [{"Name": "Science_Fiction", "Score": 0.5}, {"Name": "Science_Fiction", "Score": 0.0381}, {"Name": "Science_Fiction", "Score": 0.0372}]} {"File": "file2.txt", "Classes": [{"Name": "Documentary", "Score": 0.1}, {"Name": "Documentary", "Score": 0.0381}, {"Name": "Domentary", "Score": 0.0372}]} {"File": "file3.txt", "Classes": [{"Name": "Serious_Drama", "Score": 0.3141}, {"Name": "Other", "Score": 0.0381}, {"Name": "Other", "Score": 0.0372}]}

多标签输出

当您使用在多标签模式下训练的分类器时,结果会显示 labels。这些标签中的每一个 labels 都是在训练分类器时用来创建类别集的标签。

以下示例使用这些独特的标签。

SCIENCE_FICTION ACTION DRAMA COMEDY ROMANCE

如果您的输入数据格式为每行一个文档,则输出文件中的每行包含输入中的一行。每行都包括文件名、输入行的从零开始的行号以及文档中找到的一个或多个类。以 Amazon Comprehend 对单个实例正确分类的置信度作为结束。

例如:

{"File": "file1.txt", "Line": "0", "Labels": [{"Name": "Action", "Score": 0.8642}, {"Name": "Drama", "Score": 0.650}, {"Name": "Science Fiction", "Score": 0.0372}]} {"File": "file1.txt", "Line": "1", "Labels": [{"Name": "Comedy", "Score": 0.5}, {"Name": "Action", "Score": 0.0381}, {"Name": "Drama", "Score": 0.0372}]} {"File": "file1.txt", "Line": "2", "Labels": [{"Name": "Action", "Score": 0.9934}, {"Name": "Drama", "Score": 0.0381}, {"Name": "Action", "Score": 0.0372}]} {"File": "file1.txt", "Line": "3", "Labels": [{"Name": "Romance", "Score": 0.9845}, {"Name": "Comedy", "Score": 0.8756}, {"Name": "Drama", "Score": 0.7723}, {"Name": "Science_Fiction", "Score": 0.6157}]}

如果您的输入数据格式为每个文件一个文档,则输出文件包含每个文档一行。每行都有文件名和文档中找到的一个或多个类。以 Amazon Comprehend 对单个实例正确分类的置信度作为结束。

例如:

{"File": "file0.txt", "Labels": [{"Name": "Action", "Score": 0.8642}, {"Name": "Drama", "Score": 0.650}, {"Name": "Science Fiction", "Score": 0.0372}]} {"File": "file1.txt", "Labels": [{"Name": "Comedy", "Score": 0.5}, {"Name": "Action", "Score": 0.0381}, {"Name": "Drama", "Score": 0.0372}]} {"File": "file2.txt", "Labels": [{"Name": "Action", "Score": 0.9934}, {"Name": "Drama", "Score": 0.0381}, {"Name": "Action", "Score": 0.0372}]} {"File": "file3.txt”, "Labels": [{"Name": "Romance", "Score": 0.9845}, {"Name": "Comedy", "Score": 0.8756}, {"Name": "Drama", "Score": 0.7723}, {"Name": "Science_Fiction", "Score": 0.6157}]}

半结构化输入文档的输出

对于半结构化输入文档,输出可以包括以下附加字段:

  • DocumentMetadata — 提取有关文档的信息。元数据包括文档中的页面列表,以及从每页中提取的字符数。如果请求包含 Byte 参数,则响应中会显示此字段。

  • DocumentType -输入文档中每页的文档类型。如果请求包含 Byte 参数,则响应中会显示此字段。

  • 错误:系统在处理输入文档时检测到的页面级错误。如果系统未遇到任何错误,则该字段为空。

有关这些输出字段的更多详细信息,请参阅ClassifyDocumentAmazon Comprehend API 参考》。

以下示例显示了两页扫描的 PDF 文件的输出。

[{ #First page output "Classes": [ { "Name": "__label__2 ", "Score": 0.9993996620178223 }, { "Name": "__label__3 ", "Score": 0.0004330444789957255 } ], "DocumentMetadata": { "PageNumber": 1, "Pages": 2 }, "DocumentType": "ScannedPDF", "File": "file.pdf", "Version": "VERSION_NUMBER" }, #Second page output { "Classes": [ { "Name": "__label__2 ", "Score": 0.9993996620178223 }, { "Name": "__label__3 ", "Score": 0.0004330444789957255 } ], "DocumentMetadata": { "PageNumber": 2, "Pages": 2 }, "DocumentType": "ScannedPDF", "File": "file.pdf", "Version": "VERSION_NUMBER" }]