Guidance for a Media Lake on AWS

Overview

This Guidance demonstrates how to deploy a media lake, which addresses media management challenges for organizations of all sizes using AWS services and partner integrations. It shows how to create a centralized system for managing digital media assets throughout their lifecycle, featuring automated and manual media workflows, global namespace organization, and advanced metadata management. The Guidance helps implement human-in-the-loop review capabilities and a unified media, archive, and metadata data catalog that connects through API and user interface layers. By following this Guidance, organizations can optimize processing times, reduce costs, and enhance content monetization.

Benefits

Accelerate decision-making for digital media assets

Automate metadata enrichment, intelligent search, and proxy generation for efficient asset discovery, repurposing, and monetization while reducing costs. Optimize content lifecycle management, speed up approval cycles, and enhance workflow efficiency.

Reduce production workflow complexity

Streamline media operations with customizable, event-driven pipelines and automated quality controls such as similarity detection. Eliminate manual handoffs while maintaining creative oversight through human-in-the-loop review capabilities.

Streamline media processing workflows

Accelerate content delivery with configurable pipelines that combine automated processing and controlled review stages while maintaining quality standards through structured approval processes.

How it works

Overview

This architecture diagram provides a functional overview of the capabilities of a media lake on AWS.

Download the architecture diagram Overview Step 1
Upload new media files to Amazon Simple Storage Service (Amazon S3). Upload triggers an event to initiate processing.
Step 2
AWS Lambda, Amazon Simple Queue Service (Amazon SQS), and Amazon EventBridge coordinate the flow of events after ingestion. Lambda functions handle initial processing, and EventBridge routes events to transformation, enrichment, and pipeline components.
Step 3
Search features support semantic and keyword search in addition to filtering of indexed assets.
Step 4
Organization logic groups related assets using metadata or similarity scoring. A storage browser is used to explore assets in the connector.
Step 5
Media transformation creates proxies, thumbnails, or derivative assets when triggered.
Step 6
Metadata management extracts technical- and user-defined metadata to support powerful search and discovery.
Step 7
Default or custom pipelines coordinate analysis, enrichment, and transformation using AWS and partner services.
Step 8
RESTful APIs enable integration with external systems, allowing ingestion, search, and asset retrieval.
Step 9
Lambda and EventBridge coordinate the execution of custom analysis and transformation pipelines, accessing credentials in AWS Secrets Manager enabling secure workflows.
Step 10
Amazon S3, Amazon API Gateway, Lambda, Amazon OpenSearch Service, Amazon DynamoDB, EventBridge, Amazon SQS, Amazon Bedrock AgentCore, Amazon CloudWatch, and AWS X-Ray power the media lake functions.
High-level application architecture

This architecture diagram shows the high-level API, storage, and back-end architecture of a media lake on AWS.

Download the architecture diagram High-level application architecture Step 1
Operators access the media lake user interface through Amazon CloudFront with protection provided by AWS Web Application Firewall (WAF). CloudFront serves the static web application from Amazon S3.
Step 2
Amazon Cognito performs user authentication with authorization managed through Amazon Verified Permissions.
Step 3
API Gateway routes authenticated requests, which are processed by Lambda functions that invoke backend services as needed.
Step 4
Lambda queries OpenSearch Service to return search and retrieval results. Amazon DynamoDB manages asset and service metadata.
Step 5
EventBridge receives internal events from the media lake through its API layer and pipeline layer, powering downstream processes such as pipeline execution, audit logging, and compliance tracking.
Step 6
Amazon S3 buckets are used to store media files and assets, host infrastructure as code packages, and templates used for translation pipelines.
Step 7
EventBridge triggers pipelines upon receiving events. These pipelines pull media from Amazon S3, metadata from Amazon DynamoDB, and credentials from Secrets Manager. Lambda functions carry out operations such as proxy generation, embedding generation, and media enrichment, all orchestrated through Step Functions.
Step 8
Amazon Bedrock AgentCore hosts and runs the coordinator agent. Users interact with the agent using natural language requests, such as, "Provide a summary of what's in this video" through the media lake's user interface. The coordinator agent connects with specialized agents to process these requests. Specialized agents use agent tools to interact with services to accomplish the user request.
Pipeline execution and deployment

This architecture diagram shows the deployment and execution of pipelines used in a media lake to process media and produce metadata to aid search and render new versions for use with downstream systems.

Download the architecture diagram Pipeline execution and deployment Step 1
Users define media processing workflows, through a no-code drag-and-drop canvas, save them, and deploy them as pipelines.
Step 2
User creates a pipeline by sending request to AWS Step Functions.
Step 3
Amazon S3 generates event notifications when new media is uploaded, which are copied and sent to the media lake analysis event bus. to the internal Media Lake event bus.
Step 4
The media lake creates EventBridge event rules that trigger pipelines based on new asset events or the completion of previous pipelines.
Step 5
Amazon SQS queues incoming events, allowing them to be buffered and processed asynchronously.
Step 6
Lambda handles events from the queue and triggers the Step Functions that represent deployed pipelines.
Step 7
Step Functions define each pipeline as an individual state machine, executing the logic configured in the canvas.
Step 8
Step Functions enable pipelines to integrate with AWS services, AWS internal software vendor (ISV) partners, or third-party systems as needed.
Step 9
Step Functions coordinates the entire pipeline, reading media from Amazon S3, invoking Lambda (monitored through CloudWatch and X-Ray) to extract metadata and write it to DynamoDB, and finally, using AWS Elemental MediaConvert to generate proxies. It then stores outputs back in Amazon S3. Amazon Bedrock AgentCore hosts agents that are used in pipelines.

Deploy with confidence

Everything you need to launch this Guidance in your account is right here.

Let's make it happen

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.