检测视频中的标签 - Amazon Rekognition

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

检测视频中的标签

Amazon Rekognition Video 可以检测视频中的标签(对象和概念)以及检测到标签的时间。有关开发工具包代码示例,请参阅使用 Java 或 Python 分析存储在 Amazon S3 存储桶中的视频 (SDK)。有关 AWS CLI 示例,请参阅使用分析视频 AWS Command Line Interface

Amazon Rekognition Video 标签检测是一项异步操作。要开始检测视频中的标签,请调用StartLabel检测

Amazon Rekognition Video 会将视频分析的完成状态发布到 Amazon Simple Notification Service 主题。如果视频分析成功,请调用 Detection 获取GetLabel检测到的标签。有关调用视频分析 API 操作的信息,请参阅调用 Amazon Rekognition Video 操作

StartLabel检测请求

以下示例是 StartLabelDetection 操作的请求。您为 StartLabelDetection 操作提供了存储在 Amazon S3 存储桶中的视频。在示例请求 JSON 中,指定了 Amazon S3 存储桶和视频名称,以及 MinConfidenceFeaturesSettingsNotificationChannel

MinConfidence 是 Amazon Rekognition Video 对检测到的标签或实例边界框(如果检测到)要在响应中返回而对其准确度所具有的最小置信度。

使用 Features,您可以指定要将 GENERAL_LABELS 作为响应的一部分返回。

使用 Settings,您可以筛选 GENERAL_LABELS 的返回项目。对于标签,您可以使用纳入和排除筛选器。您也可以按特定标签、单个标签或标签类别进行筛选:

  • LabelInclusionFilters – 用于指定要在响应中纳入哪些标签

  • LabelExclusionFilters – 用于指定要从响应中排除哪些标签。

  • LabelCategoryInclusionFilters – 用于指定要在响应中纳入哪些标签类别。

  • LabelCategoryExclusionFilters - 用于指定要从响应中排除哪些标签类别。

您还可以根据需要组合纳入和排除筛选器,排除某些标签或类别,而纳入其他标签或类别。

NotificationChannel 是您希望 Amazon Rekognition Video 向其发布标签检测操作完成状态的 Amazon SNS 主题的 ARN。如果您使用的是 AmazonRekognitionServiceRole 权限策略,那么 Amazon SNS 主题的主题名称必须以 Rekognition 开头。

以下是 JSON 格式的示例 StartLabelDetection 请求,包括筛选器:

{ "ClientRequestToken": "5a6e690e-c750-460a-9d59-c992e0ec8638", "JobTag": "5a6e690e-c750-460a-9d59-c992e0ec8638", "Video": { "S3Object": { "Bucket": "bucket", "Name": "video.mp4" } }, "Features": ["GENERAL_LABELS"], "MinConfidence": 75, "Settings": { "GeneralLabels": { "LabelInclusionFilters": ["Cat", "Dog"], "LabelExclusionFilters": ["Tiger"], "LabelCategoryInclusionFilters": ["Animals and Pets"], "LabelCategoryExclusionFilters": ["Popular Landmark"] } }, "NotificationChannel": { "RoleArn": "arn:aws:iam::012345678910:role/SNSAccessRole", "SNSTopicArn": "arn:aws:sns:us-east-1:012345678910:notification-topic", } }

GetLabelDetection 操作响应

GetLabelDetection 将返回一个数组 (Labels),其中包含有关在视频中检测到的标签的信息。数组可以按时间排序,也可以按指定 SortBy 参数时检测到的标签进行排序。也可以使用 AggregateBy 参数选择如何汇总响应项。

以下示例是 GetLabelDetection 的 JSON 响应。在响应中,请注意以下内容:

  • 排序顺序 – 返回的标签数组按时间进行排序。要按标签进行排序,请为 GetLabelDetectionSortBy 输入参数中指定 NAME。如果标签在视频中多次出现,则会有 (LabelDetection) 元素的多个实例。默认排序顺序为 TIMESTAMP,而辅助排序顺序为 NAME

  • 标签信息LabelDetection 数组元素包含一个(标签)对象,该对象包含标签名称和 Amazon Rekognition 在检测到的标签的准确性中具有的置信度。Label 对象还包括标签的分层分类和常见标签的边界框信息。Timestamp 是从视频开始到检测到标签的时间,以毫秒为单位。

    还会返回与标签关联的任何类别或别名的相关信息。对于按视频SEGMENTS汇总的结果,将返回 StartTimestampMillisEndTimestampMillisDurationMillis 结构,它们分别定义了片段的开始时间、结束时间和持续时间。

  • 汇总 – 指定返回结果时的汇总方式。默认为按 TIMESTAMPS 汇总。您也可以选择按 SEGMENTS 汇总,即在某个时间段内汇总结果。如果按 SEGMENTS 汇总,则不会返回有关检测到的带有边界框的实例的信息。只返回在分段期间检测到的标签。

  • 分页信息 - 此示例显示一页标签检测信息。您可以为 GetLabelDetectionMaxResults 输入参数中指定要返回的 LabelDetection 对象的数量。如果存在的结果的数量超过了 MaxResults,则 GetLabelDetection 会返回一个令牌 (NextToken),用于获取下一页的结果。有关更多信息,请参阅 获取 Amazon Rekognition Video 分析结果

  • 视频信息 – 此响应包含有关由 GetLabelDetection 返回的每页信息中的视频格式(VideoMetadata)的信息。

以下是 JSON 格式的示例 GetLabelDetection 响应,其中包含由 TIMESTAMPS 进行聚合:

{ "JobStatus": "SUCCEEDED", "LabelModelVersion": "3.0", "Labels": [ { "Timestamp": 1000, "Label": { "Name": "Car", "Categories": [ { "Name": "Vehicles and Automotive" } ], "Aliases": [ { "Name": "Automobile" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875, // Classification confidence "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567 }, "Confidence": 99.9364013671875 // Detection confidence } ] } }, { "Timestamp": 1000, "Label": { "Name": "Cup", "Categories": [ { "Name": "Kitchen and Dining" } ], "Aliases": [ { "Name": "Mug" } ], "Parents": [], "Confidence": 99.9364013671875, // Classification confidence "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567 }, "Confidence": 99.9364013671875 // Detection confidence } ] } }, { "Timestamp": 2000, "Label": { "Name": "Kangaroo", "Categories": [ { "Name": "Animals and Pets" } ], "Aliases": [ { "Name": "Wallaby" } ], "Parents": [ { "Name": "Mammal" } ], "Confidence": 99.9364013671875, "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567, }, "Confidence": 99.9364013671875 } ] } }, { "Timestamp": 4000, "Label": { "Name": "Bicycle", "Categories": [ { "Name": "Hobbies and Interests" } ], "Aliases": [ { "Name": "Bike" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875, "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567 }, "Confidence": 99.9364013671875 } ] } } ], "VideoMetadata": { "ColorRange": "FULL", "DurationMillis": 5000, "Format": "MP4", "FrameWidth": 1280, "FrameHeight": 720, "FrameRate": 24 } }

以下是 JSON 格式的示例 GetLabelDetection 响应,其中包含按分段进行聚合:

{ "JobStatus": "SUCCEEDED", "LabelModelVersion": "3.0", "Labels": [ { "StartTimestampMillis": 225, "EndTimestampMillis": 3578, "DurationMillis": 3353, "Label": { "Name": "Car", "Categories": [ { "Name": "Vehicles and Automotive" } ], "Aliases": [ { "Name": "Automobile" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875 // Maximum confidence score for Segment mode } }, { "StartTimestampMillis": 7578, "EndTimestampMillis": 12371, "DurationMillis": 4793, "Label": { "Name": "Kangaroo", "Categories": [ { "Name": "Animals and Pets" } ], "Aliases": [ { "Name": "Wallaby" } ], "Parents": [ { "Name": "Mammal" } ], "Confidence": 99.9364013671875 } }, { "StartTimestampMillis": 22225, "EndTimestampMillis": 22578, "DurationMillis": 2353, "Label": { "Name": "Bicycle", "Categories": [ { "Name": "Hobbies and Interests" } ], "Aliases": [ { "Name": "Bike" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875 } } ], "VideoMetadata": { "ColorRange": "FULL", "DurationMillis": 5000, "Format": "MP4", "FrameWidth": 1280, "FrameHeight": 720, "FrameRate": 24 } }

转变回 GetLabelDetection 应

使用 GetLabelDetection API 操作检索结果时,您可能需要响应结构来模仿旧的 API 响应结构,其中主标签和别名都包含在同一个列表中。

上一节中的 JSON 响应示例,显示了来自的 API 响应的当前形式 GetLabelDetection。

以下示例显示了 GetLabelDetection API 之前的响应:

{ "Labels": [ { "Timestamp": 0, "Label": { "Instances": [], "Confidence": 60.51791763305664, "Parents": [], "Name": "Leaf" } }, { "Timestamp": 0, "Label": { "Instances": [], "Confidence": 99.53411102294922, "Parents": [], "Name": "Human" } }, { "Timestamp": 0, "Label": { "Instances": [ { "BoundingBox": { "Width": 0.11109819263219833, "Top": 0.08098889887332916, "Left": 0.8881205320358276, "Height": 0.9073750972747803 }, "Confidence": 99.5831298828125 }, { "BoundingBox": { "Width": 0.1268676072359085, "Top": 0.14018426835536957, "Left": 0.0003282368124928324, "Height": 0.7993982434272766 }, "Confidence": 99.46029663085938 } ], "Confidence": 99.63411102294922, "Parents": [], "Name": "Person" } }, . . . { "Timestamp": 166, "Label": { "Instances": [], "Confidence": 73.6471176147461, "Parents": [ { "Name": "Clothing" } ], "Name": "Sleeve" } } ], "LabelModelVersion": "2.0", "JobStatus": "SUCCEEDED", "VideoMetadata": { "Format": "QuickTime / MOV", "FrameRate": 23.976024627685547, "Codec": "h264", "DurationMillis": 5005, "FrameHeight": 674, "FrameWidth": 1280 } }

如果需要,您可以转换当前响应以遵循旧响应的格式。您可以使用以下示例代码将最新的 API 响应转换为之前的 API 响应结构:

from copy import deepcopy VIDEO_LABEL_KEY = "Labels" LABEL_KEY = "Label" ALIASES_KEY = "Aliases" INSTANCE_KEY = "Instances" NAME_KEY = "Name" #Latest API response sample for AggregatedBy SEGMENTS EXAMPLE_SEGMENT_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label":{ "Name": "Person", "Confidence": 97.530106, "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, "StartTimestampMillis": 0, "EndTimestampMillis": 500666, "DurationMillis": 500666 }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, "StartTimestampMillis": 6400, "EndTimestampMillis": 8200, "DurationMillis": 1800 }, ] } #Output example after the transformation for AggregatedBy SEGMENTS EXPECTED_EXPANDED_SEGMENT_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label":{ "Name": "Person", "Confidence": 97.530106, "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, "StartTimestampMillis": 0, "EndTimestampMillis": 500666, "DurationMillis": 500666 }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, "StartTimestampMillis": 6400, "EndTimestampMillis": 8200, "DurationMillis": 1800 }, { "Timestamp": 0, "Label":{ "Name": "Human", "Confidence": 97.530106, "Parents": [], "Categories": [ { "Name": "Person Description" } ], }, "StartTimestampMillis": 0, "EndTimestampMillis": 500666, "DurationMillis": 500666 }, ] } #Latest API response sample for AggregatedBy TIMESTAMPS EXAMPLE_TIMESTAMP_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label": { "Name": "Person", "Confidence": 97.530106, "Instances": [ { "BoundingBox": { "Height": 0.1549897, "Width": 0.07747964, "Top": 0.50858885, "Left": 0.00018205095 }, "Confidence": 97.530106 }, ], "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Instances": [], "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, }, ] } #Output example after the transformation for AggregatedBy TIMESTAMPS EXPECTED_EXPANDED_TIMESTAMP_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label": { "Name": "Person", "Confidence": 97.530106, "Instances": [ { "BoundingBox": { "Height": 0.1549897, "Width": 0.07747964, "Top": 0.50858885, "Left": 0.00018205095 }, "Confidence": 97.530106 }, ], "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Instances": [], "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, }, { "Timestamp": 0, "Label": { "Name": "Human", "Confidence": 97.530106, "Parents": [], "Categories": [ { "Name": "Person Description" } ], }, }, ] } def expand_aliases(inferenceOutputsWithAliases): if VIDEO_LABEL_KEY in inferenceOutputsWithAliases: expandInferenceOutputs = [] for segmentLabelDict in inferenceOutputsWithAliases[VIDEO_LABEL_KEY]: primaryLabelDict = segmentLabelDict[LABEL_KEY] if ALIASES_KEY in primaryLabelDict: for alias in primaryLabelDict[ALIASES_KEY]: aliasLabelDict = deepcopy(segmentLabelDict) aliasLabelDict[LABEL_KEY][NAME_KEY] = alias[NAME_KEY] del aliasLabelDict[LABEL_KEY][ALIASES_KEY] if INSTANCE_KEY in aliasLabelDict[LABEL_KEY]: del aliasLabelDict[LABEL_KEY][INSTANCE_KEY] expandInferenceOutputs.append(aliasLabelDict) inferenceOutputsWithAliases[VIDEO_LABEL_KEY].extend(expandInferenceOutputs) return inferenceOutputsWithAliases if __name__ == "__main__": segmentOutputWithExpandAliases = expand_aliases(EXAMPLE_SEGMENT_OUTPUT) assert segmentOutputWithExpandAliases == EXPECTED_EXPANDED_SEGMENT_OUTPUT timestampOutputWithExpandAliases = expand_aliases(EXAMPLE_TIMESTAMP_OUTPUT) assert timestampOutputWithExpandAliases == EXPECTED_EXPANDED_TIMESTAMP_OUTPUT