Solution components - Content Localization on AWS

Solution components

The Content Localization on AWS solution consists of a number of components that are deployed to AWS using CloudFormation nested stacks. The components include the AWS Media Insights Engine (MIE) solution, a Media Insights Engine (MIE) workflow, a web application, web application authentication components, and an Amazon OpenSearch Service MIE custom pipeline consumer.

Media Insights Engine (MIE) solution

MIE is deployed in one of two modes, depending on which option you choose: 

  1. As a nested stack from the deployment/content-localization-on-aws.yaml template.

  2. As a standalone stack whose name is provided as a parameter input to the deployment/content-localization-on-aws-use-existing-mie-stack.yaml template.

MIE provides services for creating and running the content localization workflow, and for storage and retrieval of the media objects and metadata that are generated by the workflow for each input video (asset). At deployment time, this solution integrates with MIE to create a workflow using the MIE operator library and MIE workflow custom resources. At runtime, the Content Localization on AWS solution integrates with MIE through its REST APIs to run the content localization workflow, and store and retrieve media objects and analysis outputs that are created by the workflow. Finally, the solution integrates with MIE through DynamoDB stream events emitted by the MIE data pipeline as workflows run to store analysis results in Amazon OpenSearch Service.

MIE content localization workflow

Template for nested stack: deployment/content-localization-on-aws-video-workflow.yaml

The ContentLocalizationWorkflow MIE workflow orchestrates the analysis and application logic to automatically generate multi-language video subtitles. The workflow is composed of Media Insights Engine operators as shown in the following diagram:

        Content localization workflow diagram including the following stages:
          PreprocessVideo, AnalyzeVideo, AnalyzeText, WebCaptions, Translate, and

Figure 2: Content localization workflow diagram

Like all MIE workflows, the content localization workflow is composed of a number of stages that are run sequentially by MIE when the application calls the POST /workflow/execution API. Each stage contains MIE operators that are run in parallel.

The workflow uses MIE workflow reprocessing to update selected outputs of the workflow for an existing asset after application users edit and save source and target language subtitles. By only reprocessing dependent, downstream operators, the solution can save the cost of rerunning operators whose outputs will not change.


The workflow entry points used by the Content Localization on AWS application are shown by the green dots in the diagram.

Operators in the default workflow

MIE operators that run within the content localization workflow analyze and/or transform the input video. Operators write analysis results (metadata about the input video) to the MIE data pipeline, which stores raw output in the Amazon S3 DataplaneBucket bucket and makes the data available to downstream MIE data pipeline consumers. The solution employs one built-in pipeline consumer, an Amazon OpenSearch Service resource. The web application retrieves results of operators run in the workflow using the MIE DataplaneApiEndpoint or the Amazon OpenSearch Service API. Transformed media outputs (such as WebVTT subtitle tracks) are also stored using the MIE Dataplane API. Output media objects are stored in the MIE S3 Dataplane bucket. The object paths can also be retrieved using the MIE Dataplane API for the operator.

Both the solution’s web application and the content localization workflow can retrieve the result of any operator that has already run by calling the GET /metadata/<assetid>/<operator-name> MIE DataplaneApiEndpoint.

Pre-built MIE operators:

  • Mediainfo - performs analysis on the video package and provides information about the format of the video, including the number and types of tracks, encoding formats for tracks, and other structural information.

  • Thumbnail - uses AWS MediaConvert to generate thumbnail images, transcode the video into a uniformly formatted proxy format that will be used as input to downstream operators, and create an audio-only file from the input.

  • TranscribeVideo - generates a transcript of the spoken audio in the video using Amazon Transcribe.

  • WebCaptions - converts the transcript generated by Amazon Transcribe into subtitle blocks stored in JSON format that contains the original transcript word-level timeseries plus subtitles.

    The WebCaptions JSON metadata for items is stored in the following format:

    { "WebCaptions": [ { "start": "0.04", "end": "0.69", "caption": "En Austin. ", "sourceCaption": "In Austin." }, { "start": "0.7", "end": "4.66", "caption": "Es 60\u00b0 con rel\u00e1jalo. ", "sourceCaption": "It's 60\u00b0 with with a chance of…" }, … ] }

    WebCaptions data structure attributes:

    start – the start time of the caption relative to the start of the video.

    end – the end time of the caption relative to the start of the video.

    caption – the caption for this translation including the most recent edits made through the application.

    sourceCaption – the original caption generated from the transcript of the source video.

  • TranslateWebcaptions - Generate a collection of target-language subtitles in JSON format, from the source-language subtitles. Items in the collection contain the language code and the language-specific Operator Name that can be used to retrieve translated WebCaptions output for each language in the collection.

  • WebToVTTCaptions - Generates a collection of target-language subtitles in WebVTT format from the source-language subtitles. Items in the collection contain the language codes and object paths of each available subtitle track.

  • WebToSRTCaptions - Generates a collection of target-language subtitles in SRT format from the source-language subtitles. Items in the collection contain the language codes and object paths of each available subtitle tracks.

  • PollyWebcaptions - generates a collection of stand-alone audio files for all languages included in the workflow. Audio timing is not synchronized to the video. It is meant to be used in listen-only mode (for example, in situations where there is not enough bandwidth to play the video). Contains the language codes and S3 object path of each available audio track.

Web application

Template for nested stack: deployment/content-localization-on-aws-web.yaml

Web application component with web source S3 bucket anda CoudFront distribution.

Figure 3: Web application component

The Content Localization on AWS solution features a simple static web application hosted in Amazon S3 for uploading, analyzing, and browsing video collections and creating subtitles for them. The deployed application accesses MIE-created Amazon API Gateway endpoints for running the MIE workflow API and the MIE DataplaneApiEndpoint for asset storage and retrieval. It accesses the Amazon OpenSearch Service API Gateway endpoint for search and metadata retrieval.

Web application authentication

Template for nested stack: deployment/content-localization-on-aws-auth.yaml

The web application authentication component uses Amazon Cognito and IAM.

Figure 4: Web application authentication component

The Content Localization on AWS web application uses Amazon Cognito user pools and identity pools for user authentication. The web application uses AWS Amplify prebuilt UI components for interacting with the authentication services.

When authenticated users upload files through the application, the files are stored in private folders that correspond to their unique Amazon Cognito identifier to ensure fine-grained access control using AWS Identity and Access Management (IAM) policies.

Amazon OpenSearch Service MIE pipeline consumer

Template for nested stack: deployment/content-localization-on-aws-opensearch.yaml

The Amazon OpenSearch Service MIE pipeline consumer uses Amazon DynamoDB Streams, a Lambda consumer function, and Amazon OpenSearch Service.

Figure 5: Amazon OpenSearch Service MIE pipeline consumer

This solution supports full-featured search of the metadata generated by the ContentLocalizationWorkflow workflow using Amazon OpenSearch Service. It indexes the metadata generated from its MIE operators in Amazon OpenSearch Service by attaching an Amazon OpenSearch Service consumer to the MIE data pipeline. The Lambda OpenSearchConsumer Lambda function is invoked by the MIE data plane DynamoDB stream whenever metadata for an MIE operator is stored to the MIE data plane. The lambda accesses the new or updated metadata and stores it to an Amazon OpenSearch Service index in the  Amazon OpenSearch Service ESDomain instance.

The Content Localization on AWS application uses Apache Lucene queries to Amazon OpenSearch Service to support direct search by end users on the Collection page and internally to provide fine-grained search for Amazon Rekognition operators on the Analyze page.