Solution Components - Media2Cloud

Solution Components

Ingest Process

When a new video or image is uploaded to the ingest Amazon S3 bucket through the web interface, the ingest process starts. The workflow generates an asset unique identifier, computes and validates an MD5 checksum, and extracts media information such as bitrate, formats, audio channels container format for video, or EXIF information such as GPS location, model, and make for image. The workflow creates a proxy file and thumbnails using AWS Elemental MediaConvert. The proxy files and thumbnail images are stored in the proxy Amazon S3 bucket. The technical metadata are indexed to an Amazon Elasticsearch Service (Amazon ES) cluster

When the workflow is completed, the source files are tagged to allow the Amazon S3 lifecycle policy to move files to Amazon S3 Glacier storage class for archiving.

Analysis Process

The Media2Cloud solution provides the following preset options for the analysis process when you deploy the template: Default, All, and Audio and Text.

  • Default enables celebrity recognition, labels, transcription, key phrases, entities, and text processes.

  • All enables all detections including celebrity recognition, labels, transcription, key phrases, entities, text, faces, face matches, person, moderation, sentiment, and topic processes.

  • Audio and Text enables transcription, key phrases, entities, and text processes.

Four state machines are deployed to process the analysis.

  • The video analysis state machine analyzes and extracts AI/ML metadata from the video proxy using Amazon Rekognition video APIs.

  • The audio analysis state machine analyzes and extracts AI/ML metadata from the audio stream of the proxy file using Amazon Transcribe and Amazon Comprehend.

  • The image analysis state machine analyzes and extracts image metadata with Amazon Rekognition image APIs.

  • The analysis monitoring state machine monitors the video analysis, audio analysis, and image analysis state machines and periodically reports the analysis process status to the web interface by sending the status to an AWS IoT Core MQTT topic. The machine learning metadata results are stored in the proxy Amazon S3 bucket and indexed in an Amazon ES cluster.

Labeling Process

The labeling workflow manages the lifecycle of the labeling job from its creation to indexing the results. Using the web interface, you can crop faces from videos or images and either label them immediately or place all the cropped faces in a queue for batch processing.

If you select batch processing, use Amazon SageMaker Ground Truth private workforce to send a batch job to your labeling work team for processing. Your labeling team can be composed of staff within your organization as well as external workers (such as contractors and interns). Your labeling team receives an email notification containing the access details for the labeling job. When the job is completed, the workflow collects and indexes the annotated results in the Amazon Rekognition face collection. The indexed faces are stored in an Amazon DynamoDB table.

Error Handling

The Media2Cloud solution applies a catch and retry concept for error handling to the state machines to improve the resiliency of the solution by retrying the state execution multiple times. When the state execution exhausts the retries, it stops the execution and generates an error.

The solution also uses Amazon CloudWatch Events to respond to execution errors caused by the state machines (ingest, analysis, and labeling). The error handling Lambda function processes the error by analyzing the execution history of the failed state machine and sends an Amazon Simple Notification Service (Amazon SNS) notification to subscribers.

Proxy Files

When a new video is uploaded to Amazon S3, the Media2Cloud solution automatically converts the video to .mp4 format and creates a compressed version of the video known as a proxy file. For this solution, proxy files are used to enable users to upload videos of various sizing and formatting, without being subject to Amazon Rekognition and Amazon Transcribe limits. Additionally, the proxy files can be used as reference proxies in a Media Asset Manager (MAM) for search, discovery, and proxy editing.

Web Interface

The Media2Cloud solution deploys a web interface that makes it easy to upload, browse, search video and image files, index faces to create your own face collection, and view artificial intelligence and machine learning information. This web interface can be used as a reference for building your own end-to-end ingest and analysis workflow applications. The interface automatically subscribes to the AWS IoT Core message broker to display the ingest, analysis, and labeling process status and progress. You can use the web interface to search results in the Amazon ES cluster and start workflows.

The web interface includes an HTML5 video player by VideoJS that can play the MP4 proxy video files, generated by the ingest workflow and displays Machine Learning (ML) metadata created by the analysis workflow by using Amazon S3 signed URLs.

Amazon DynamoDB

The solution deploys the following Amazon DynamoDB tables which are configured to use on-demand capacity and encryption at rest using SSE.

  • A table to store ingest information

  • A table to store machine learning metadata

  • A table to store indexed faces

  • A table to store queued faces that are ready to process by labeling workers

  • A table to temporarily store analysis results used internally by the analysis workflow

Amazon SNS

This solution deploys two Amazon Simple Notification Service (Amazon SNS) topics: one used to receive ingest, analysis, labeling, and error notifications from the workflows and one used by the Amazon SageMaker Ground Truth private workforce to send notifications to the labeling workers.