Frame-based Analysis for Your Videos
Frame-based Analysis for Your Videos

Architecture Overview

This solution contains both video-processing and face-search components. The video-processing component runs uploaded videos through a frame-based analysis workflow to extract image metadata associated with video frames. The face-search component uses uploaded images to search video metadata and identify videos that contain matching faces. The following sections describe these components in more detail.

Video Processing Component

Deploying this solution with the default parameters builds the following video-processing component in the AWS Cloud.

        Video Processing Architecture

Figure 1: Video Processing Architecture

Amazon S3 Video Bucket

This solution creates an Amazon Simple Storage Service (Amazon S3) bucket for storing videos and extracted frames. When a video is uploaded to this bucket, it triggers the frame analysis workflow. Amazon S3 creates an Amazon Simple Queue Service (Amazon SQS) message, which is used to manage the queue of videos that the solution will process.  

Frame Extraction and Preprocessing

To perform frame extraction and preprocessing, the solution launches Amazon Elastic Compute Cloud (Amazon EC2) instances in an Auto Scaling group. Each EC2 instance polls the Amazon SQS queue for new messages, indicating that a new video is ready for processing. The instance uses an Amazon S3 link in the Amazon SQS message to download the new file. FFmpeg is used to extract one frame every second. A Lambda function is triggered and posts the preprocessing results in an AWS IoT topic. After this preprocessing work is completed, the instance uploads every processed frame back to the Amazon S3 video bucket, and stores the frame information in an Amazon DynamoDB table in batches. To optimize processing time and control the parallel Lambda function executions, batch sizes are calculated by the length of video being processed. For example, a thirty-minute video batches in sets of 20, a two-hour video batches in sets of 60, and videos longer than two hours batch in sets of 80.

Image Processing

When an EC2 instance stores frame information in the Amazon DynamoDB table, an image processing AWS Lambda function is triggered. This Lambda function calls Amazon Rekognition to identify metadata for each frame, and saves the metadata in the DynamoDB table. The function also creates and manages Amazon Rekognition face collections which are used to collect faces from any frames identified to contain one or more faces. The results of the image processing are posted in an AWS IoT topic. After all the frames for a video are processed, a final Lambda function processes all of the individual frame metadata for the video to create a consolidated list of tags and labels, which is then stored in Amazon DynamoDB, Amazon S3, and sent to an Amazon SNS topic to alert the subscriber of the new tags.


This solution uses the video filename for organizing copies of each frame in Amazon S3. Note this as you consider your video naming convention, and ensure that each video to be processed has a unique name.

AWS IoT Topic

This solution includes an AWS IoT topic as an optional feature that you can use to monitor the video processing status for individual or groups of videos. See AWS IoT Topic for detailed information.

Face Search Component

Deploying this solution with the default parameters builds the following face-search component in the AWS Cloud.

        Face Search Architecture

Figure 2: Face Search Architecture

This solution creates another Amazon S3 bucket to store face images for searching. When you upload a face image to this bucket, it triggers an AWS Lambda function that initiates a search to find matching faces in the previously-extracted video metadata. Another AWS Lambda function retrieves the Amazon Rekognition collections from the DynamoDB table and searches those collections for the image. The results of the search are stored in an Amazon DynamoDB table, and the results status of the search are sent to an AWS IoT Topic.