Detecting labels in a video - Amazon Rekognition

Detecting labels in a video

Amazon Rekognition Video can detect labels, and the time a label is detected, in a video. For an SDK code example, see Analyzing a video stored in an Amazon S3 bucket with Java or Python (SDK). For an AWS CLI example, see Analyzing a video with the AWS Command Line Interface.

Amazon Rekognition Video label detection is an asynchronous operation. To start the detection of labels in a video, call StartLabelDetection.

Amazon Rekognition Video publishes the completion status of the video analysis to an Amazon Simple Notification Service topic. If the video analysis is succesful, call GetLabelDetection to get the detected labels. For information about calling the video analysis API operations, see Calling Amazon Rekognition Video operations.

StartLabelDetection Request

The following example is a request for the StartLabelDetection operation. You provide the StartLabelDetection operation with a video stored in an Amazon S3 bucket. In the example request JSON, the Amazon S3 bucket and the video name are specified, along with MinConfidence, Features, Settings, and NotificationChannel.

MinConfidence is the minimum confidence that Amazon Rekognition Video must have in the accuracy of the detected label, or an instance bounding box (if detected), for it to be returned in the response.

With Features , you can specify that you want GENERAL_LABELS returned as part of the response.

With Settings, you can filter the returned items for GENERAL_LABELS. For labels you can use inclusive and exclusive filters. You can also filter by label specific, individual labels or by label category:

  • LabelInclusionFilters - Used to specify which labels you want included in the response

  • LabelExclusionFilters - Used to specify which labels you want excluded from the response.

  • LabelCategoryInclusionFilters - Used to specify which label categories you want included in the response.

  • LabelCategoryExclusionFilters - Used to specify which label categories you want excluded from the response.

You can also combine inclusive and exclusive filters according to your needs, excluding some labels or categories and including others.

NotificationChannel is the ARN of the Amazon SNS topic you want Amazon Rekognition Video to publish the completion status of the label detection operation to. If you’re using the AmazonRekognitionServiceRole permissions policy, then the Amazon SNS topic must have a topic name that begins with Rekognition.

The following is a sample StartLabelDetection request in JSON form, including filters:

{ "ClientRequestToken": "5a6e690e-c750-460a-9d59-c992e0ec8638", "JobTag": "5a6e690e-c750-460a-9d59-c992e0ec8638", "Video": { "S3Object": { "Bucket": "bucket", "Name": "video.mp4" } }, "Features": ["GENERAL_LABELS"], "MinConfidence": 75, "Settings": { "GeneralLabels": { "LabelInclusionFilters": ["Cat", "Dog"], "LabelExclusionFilters": ["Tiger"], "LabelCategoryInclusionFilters": ["Animals and Pets"], "LabelCategoryExclusionFilters": ["Popular Landmark"] } }, "NotificationChannel": { "RoleArn": "arn:aws:iam::012345678910:role/SNSAccessRole", "SNSTopicArn": "arn:aws:sns:us-east-1:012345678910:notification-topic", } }

GetLabelDetection Operation Response

GetLabelDetection returns an array (Labels) that contains information about the labels detected in the video. The array can be sorted either by time or by the label detected when specifying the SortBy parameter.You can also select how response items are aggregated by using the AggregateBy parameter.

The following example is the JSON response of the GetLabelDetection. In the response, note the following:

  • Sort order – The array of labels returned is sorted by time. To sort by label, specify NAME in the SortBy input parameter for GetLabelDetection. If the label appears multiple times in the video, there will be multiples instances of the (LabelDetection) element.

  • Label information – The LabelDetection array element contains a (Label) object, which in turn contains the label name and the confidence Amazon Rekognition has in the accuracy of the detected label. A Label object also includes a hierarchical taxonomy of labels and bounding box information for common labels. Timestamp is the time the label was detected, defined as the number of milliseconds elapsed since the start of the video.

    Information about any Categories or Aliases associated with a label is also returned. For results aggregated by video SEGMENTS, the StartTimestampMillis, EndTimestampMillis, and DurationMillis structures are returned, which define the start time, end time, and duration of a segment respectively.

  • Aggregation – Specifies how results are aggregated when returned. The default is to aggregate by TIMESTAMPS. You can also choose to aggregate by SEGMENTS, which aggregates results over a time window. If aggregating by SEGMENTS, information about detected instances with bounding boxes are not returned. Only labels detected during the segments are returned.

  • Paging information – The example shows one page of label detection information. You can specify how many LabelDetection objects to return in the MaxResults input parameter for GetLabelDetection. If more results than MaxResults exist, GetLabelDetection returns a token (NextToken) used to get the next page of results. For more information, see Getting Amazon Rekognition Video analysis results.

  • Video information – The response includes information about the video format (VideoMetadata) in each page of information returned by GetLabelDetection.

The following is a sample GetLabelDetection response in JSON form with aggregation by TIMESTAMPS:

{ "JobStatus": "SUCCEEDED", "LabelModelVersion": "3.0", "Labels": [ { "Timestamp": 1000, "Label": { "Name": "Car", "Categories": [ { "Name": "Vehicles and Automotive" } ], "Aliases": [ { "Name": "Automobile" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875, // Classification confidence "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567 }, "Confidence": 99.9364013671875 // Detection confidence } ] } }, { "Timestamp": 1000, "Label": { "Name": "Cup", "Categories": [ { "Name": "Kitchen and Dining" } ], "Aliases": [ { "Name": "Mug" } ], "Parents": [], "Confidence": 99.9364013671875, // Classification confidence "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567 }, "Confidence": 99.9364013671875 // Detection confidence } ] } }, { "Timestamp": 2000, "Label": { "Name": "Kangaroo", "Categories": [ { "Name": "Animals and Pets" } ], "Aliases": [ { "Name": "Wallaby" } ], "Parents": [ { "Name": "Mammal" } ], "Confidence": 99.9364013671875, "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567, }, "Confidence": 99.9364013671875 } ] } }, { "Timestamp": 4000, "Label": { "Name": "Bicycle", "Categories": [ { "Name": "Hobbies and Interests" } ], "Aliases": [ { "Name": "Bike" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875, "Instances": [ { "BoundingBox": { "Width": 0.26779675483703613, "Height": 0.8562285900115967, "Left": 0.3604024350643158, "Top": 0.09245597571134567 }, "Confidence": 99.9364013671875 } ] } } ], "VideoMetadata": { "ColorRange": "FULL", "DurationMillis": 5000, "Format": "MP4", "FrameWidth": 1280, "FrameHeight": 720, "FrameRate": 24 } }

The following is a sample GetLabelDetection response in JSON form with aggregation by SEGMENTS:

{ "JobStatus": "SUCCEEDED", "LabelModelVersion": "3.0", "Labels": [ { "StartTimestampMillis": 225, "EndTimestampMillis": 3578, "DurationMillis": 3353, "Label": { "Name": "Car", "Categories": [ { "Name": "Vehicles and Automotive" } ], "Aliases": [ { "Name": "Automobile" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875 // Maximum confidence score for Segment mode } }, { "StartTimestampMillis": 7578, "EndTimestampMillis": 12371, "DurationMillis": 4793, "Label": { "Name": "Kangaroo", "Categories": [ { "Name": "Animals and Pets" } ], "Aliases": [ { "Name": "Wallaby" } ], "Parents": [ { "Name": "Mammal" } ], "Confidence": 99.9364013671875 } }, { "StartTimestampMillis": 22225, "EndTimestampMillis": 22578, "DurationMillis": 2353, "Label": { "Name": "Bicycle", "Categories": [ { "Name": "Hobbies and Interests" } ], "Aliases": [ { "Name": "Bike" } ], "Parents": [ { "Name": "Vehicle" } ], "Confidence": 99.9364013671875 } } ], "VideoMetadata": { "ColorRange": "FULL", "DurationMillis": 5000, "Format": "MP4", "FrameWidth": 1280, "FrameHeight": 720, "FrameRate": 24 } }

Transforming the GetLabelDetection Response

When retrieving results with the GetLabelDetection API operation, you might need the response structure to mimic the older API response structure, where both primary labels and aliases were contained in the same list.

The example JSON response found in the preceding section displays the current form of the API response from GetLabelDetection.

The following example shows the previous response from the GetLabelDetection API:

{ "Labels": [ { "Timestamp": 0, "Label": { "Instances": [], "Confidence": 60.51791763305664, "Parents": [], "Name": "Leaf" } }, { "Timestamp": 0, "Label": { "Instances": [], "Confidence": 99.53411102294922, "Parents": [], "Name": "Human" } }, { "Timestamp": 0, "Label": { "Instances": [ { "BoundingBox": { "Width": 0.11109819263219833, "Top": 0.08098889887332916, "Left": 0.8881205320358276, "Height": 0.9073750972747803 }, "Confidence": 99.5831298828125 }, { "BoundingBox": { "Width": 0.1268676072359085, "Top": 0.14018426835536957, "Left": 0.0003282368124928324, "Height": 0.7993982434272766 }, "Confidence": 99.46029663085938 } ], "Confidence": 99.63411102294922, "Parents": [], "Name": "Person" } }, . . . { "Timestamp": 166, "Label": { "Instances": [], "Confidence": 73.6471176147461, "Parents": [ { "Name": "Clothing" } ], "Name": "Sleeve" } } ], "LabelModelVersion": "2.0", "JobStatus": "SUCCEEDED", "VideoMetadata": { "Format": "QuickTime / MOV", "FrameRate": 23.976024627685547, "Codec": "h264", "DurationMillis": 5005, "FrameHeight": 674, "FrameWidth": 1280 } }

If needed, you can transform the current response to follow the format of the older response. You can use the following sample code to transform the latest API response to the previous API response structure:

from copy import deepcopy VIDEO_LABEL_KEY = "Labels" LABEL_KEY = "Label" ALIASES_KEY = "Aliases" INSTANCE_KEY = "Instances" NAME_KEY = "Name" #Latest API response sample for AggregatedBy SEGMENTS EXAMPLE_SEGMENT_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label":{ "Name": "Person", "Confidence": 97.530106, "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, "StartTimestampMillis": 0, "EndTimestampMillis": 500666, "DurationMillis": 500666 }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, "StartTimestampMillis": 6400, "EndTimestampMillis": 8200, "DurationMillis": 1800 }, ] } #Output example after the transformation for AggregatedBy SEGMENTS EXPECTED_EXPANDED_SEGMENT_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label":{ "Name": "Person", "Confidence": 97.530106, "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, "StartTimestampMillis": 0, "EndTimestampMillis": 500666, "DurationMillis": 500666 }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, "StartTimestampMillis": 6400, "EndTimestampMillis": 8200, "DurationMillis": 1800 }, { "Timestamp": 0, "Label":{ "Name": "Human", "Confidence": 97.530106, "Parents": [], "Categories": [ { "Name": "Person Description" } ], }, "StartTimestampMillis": 0, "EndTimestampMillis": 500666, "DurationMillis": 500666 }, ] } #Latest API response sample for AggregatedBy TIMESTAMPS EXAMPLE_TIMESTAMP_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label": { "Name": "Person", "Confidence": 97.530106, "Instances": [ { "BoundingBox": { "Height": 0.1549897, "Width": 0.07747964, "Top": 0.50858885, "Left": 0.00018205095 }, "Confidence": 97.530106 }, ], "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Instances": [], "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, }, ] } #Output example after the transformation for AggregatedBy TIMESTAMPS EXPECTED_EXPANDED_TIMESTAMP_OUTPUT = { "Labels": [ { "Timestamp": 0, "Label": { "Name": "Person", "Confidence": 97.530106, "Instances": [ { "BoundingBox": { "Height": 0.1549897, "Width": 0.07747964, "Top": 0.50858885, "Left": 0.00018205095 }, "Confidence": 97.530106 }, ], "Parents": [], "Aliases": [ { "Name": "Human" }, ], "Categories": [ { "Name": "Person Description" } ], }, }, { "Timestamp": 6400, "Label": { "Name": "Leaf", "Confidence": 89.77790069580078, "Instances": [], "Parents": [ { "Name": "Plant" } ], "Aliases": [], "Categories": [ { "Name": "Plants and Flowers" } ], }, }, { "Timestamp": 0, "Label": { "Name": "Human", "Confidence": 97.530106, "Parents": [], "Categories": [ { "Name": "Person Description" } ], }, }, ] } def expand_aliases(inferenceOutputsWithAliases): if VIDEO_LABEL_KEY in inferenceOutputsWithAliases: expandInferenceOutputs = [] for segmentLabelDict in inferenceOutputsWithAliases[VIDEO_LABEL_KEY]: primaryLabelDict = segmentLabelDict[LABEL_KEY] if ALIASES_KEY in primaryLabelDict: for alias in primaryLabelDict[ALIASES_KEY]: aliasLabelDict = deepcopy(segmentLabelDict) aliasLabelDict[LABEL_KEY][NAME_KEY] = alias[NAME_KEY] del aliasLabelDict[LABEL_KEY][ALIASES_KEY] if INSTANCE_KEY in aliasLabelDict[LABEL_KEY]: del aliasLabelDict[LABEL_KEY][INSTANCE_KEY] expandInferenceOutputs.append(aliasLabelDict) inferenceOutputsWithAliases[VIDEO_LABEL_KEY].extend(expandInferenceOutputs) return inferenceOutputsWithAliases if __name__ == "__main__": segmentOutputWithExpandAliases = expand_aliases(EXAMPLE_SEGMENT_OUTPUT) assert segmentOutputWithExpandAliases == EXPECTED_EXPANDED_SEGMENT_OUTPUT timestampOutputWithExpandAliases = expand_aliases(EXAMPLE_TIMESTAMP_OUTPUT) assert timestampOutputWithExpandAliases == EXPECTED_EXPANDED_TIMESTAMP_OUTPUT