Solution components - Media Insights on AWS

Solution components

Workflow API

This API creates, updates, deletes, runs, and monitors workflows.


        AWS Media Insights Engine control plane

Figure 2: Media Insights on AWS control plane

Control plane

This includes the workflow API and state machines for workflows. Workflow state machines are composed of operators from the Media Insights on AWS operator library. When operators within the state machine are run, they interact with the Media Insights on AWS data plane to store and retrieve derived asset and metadata generated from the workflow.

Use the control plane to create, read, update, and delete (CRUD) custom operators and workflows, and to execute those workflows.

Amazon DynamoDB

The following Amazon DynamoDB tables store workflow-related data:

  • Workflow – This table records user-defined workflows.

  • Workflow Execution – This table records the details of every workflow run.

  • Operations – This table records details for each operator in the operator library, such as references to Lambda functions and default runtime parameters.

  • Stage – This table records the auto-generated AWS Step Functions code needed for each operator.

  • System – This table records system-wide configurations, such as maximum concurrent workflows.

Operators

Operators are generated state machines that call AWS Lambda functions to perform media analysis or media transformation tasks. Users can define custom operators, but the Media Insights on AWS operator library includes the following pre-built operators:

  • Celebrity Recognition - An asynchronous operator to identify celebrities in a video using Amazon Rekognition.

  • Content Moderation - An asynchronous operator to identify unsafe content in videos using Amazon Rekognition.

  • Face Detection - An asynchronous operator to identify faces in videos using Amazon Rekognition.

  • Face Search - An asynchronous operator to identify faces from a custom face collection in videos using Amazon Rekognition.

  • Label Detection - An asynchronous operator to identify objects in a video using Amazon Rekognition.

  • Person Tracking - An asynchronous operator to identify people in a video using Amazon Rekognition.

  • Shot Detection - An asynchronous operator to identify camera shots in a video using Amazon Rekognition.

  • Text Detection – An asynchronous operator to identify text in a video using Amazon Rekognition.

  • Technical Cue Detection – An asynchronous operator to identify technical cues such as end credits, color bars, and black bars in a video using Amazon Rekognition.

  • Comprehend Key Phrases – An asynchronous operator to find key phrases in text using Amazon Comprehend.

  • Comprehend Entities – An asynchronous operator to find references to real-world objects, dates, and quantities in text using Amazon Comprehend.

  • Create SRT Captions – A synchronous operator to generate SRT formatted caption files from a video transcript generated by Amazon Transcribe.

  • Create VTT Captions - A synchronous operator to generate VTT formatted caption files from a video transcript generated by Amazon Transcribe.

  • Media Convert - An asynchronous operator to transcode input video into mpeg4 format using AWS Elemental MediaConvert.

  • Media Info – A synchronous operator to read technical tag data for video files.

  • Polly - An asynchronous operator that turns input text into speech using Amazon Polly.

  • Thumbnail - An asynchronous operator that generates thumbnail images for an input video file using AWS Elemental MediaConvert.

  • Transcribe - An asynchronous operator to convert input audio to text using Amazon Transcribe.

  • Translate - An asynchronous operator to translate input text using Amazon Translate.


        AWS Media Insights Engine data plane

Figure 3: Media Insights on AWS data plane

Data plane

This stores the media assets and metadata generated by workflows. Implement a consumer of the Kinesis data stream in the data plane to extract, transform, and load (ETL) data from the master Media Insights on AWS data store to downstream databases that support the data access patterns required by end-user applications.

Data plane API

This API creates, updates, deletes, and retrieves media assets and metadata.

Data plane pipeline

This pipeline stores metadata for an asset that can be retrieved using an object's AssetId and Metadata type. Writing data to the pipeline initiates a copy of the data to be stored in Kinesis Data Streams. This data stream is the interface that end-user applications can connect to use data stored in the Media Insights on AWS data plane.

Data pipeline consumers

Changes to the data plane DynamoDB table are reflected in a Kinesis data stream. For each record in that data stream, data pipeline consumers perform the necessary extract, transform, and load (ETL) tasks needed to replicate data, such as media metadata, to the data stores used by external applications. These ETL tasks are entirely use-case dependent and therefore must be user-defined. The Media Insights on AWS Developer Guide includes detailed instructions for implementing data pipeline consumers.