Architecture details
This section describes the components and AWS services that make up this solution and the architecture details on how these components work together.
AWS services in this solution
AWS service | Description |
---|---|
Amazon Athena |
Core. Access the AWS Glue Data Catalog and
query the transformed data in the stage Amazon S3 |
AWS Glue |
Core. Apply a heavy transformation in the data lake including partitioning pre-stage data and output the data into parquet files. |
AWS Lambda |
Core. Lambda is used to
add AMC instances as a part of microservices and register
provisioned customers for the data lake. Lambda is also
used to process workflow requests, check responses, notify
users, transform raw data, partition pre-stage data, and
manage metadata stored in
Amazon S3 |
AWS Lake Formation |
Core. For data lake governance and security. |
Amazon S3 |
Core. The solution uses Amazon S3 |
AWS Step Functions |
Core. Step Functions orchestrates the Lambda functions and user notifications in the Tenant Provisioning Service, Workflow Manager and data lake. |
Amazon DynamoDB |
Supporting. DynamoDB tables store details of tenants, workflows, and data lake transformations. |
Amazon EventBridge |
Supporting. EventBridge captures the raw data landing into Amazon S3 buckets and invokes the data lake on a recurring basis. |
AWS KMS |
Supporting. The solution uses KMS keys to encrypt and decrypt the data in Amazon S3 buckets, SQS queues, and DynamoDB tables. |
Amazon SNS |
Supporting. The solution uses Amazon SNS to publish execution status of workflow management service. |
Amazon SQS |
Supporting. The solution uses Amazon SQS to send, store, and receive messages between tenants, workflows, and the data lake. |
AWS Systems Manager |
Supporting. Provides application-level resource monitoring and visualization of resource operations and cost data. |
AWS Secrets Manager |
Supporting. Secrets Manager stores the user-specified OAuth credentials. |
Amazon QuickSight |
Optional. For business intelligence, analytics, interactive dashboards, and visualizations that business stakeholders can use. |
Amazon SageMaker Jupyter notebook | Optional. Amazon SageMaker with sample Jupyter notebooks that analysts can use to provision tenants and manage workflows. |
Microservices
This solution deploys six microservices: Platform Management Notebooks, Tenant Provisioning Service, Workflow Manager, Amazon Ads Reporting, Selling Partner Reporting, and the Serverless Data Lake.
Platform Management Notebooks
The Platform Management Notebooks serve as sample code for interfacing with the Tenant Provisioning Service, Workflow Manager, Amazon Ads Reporting, and Selling Partner Reporting microservices.
Tenant Provisioning Service
The Tenant Provisioning Service manages AMC customers onboarded through the solution. Each onboarded AMC customer is mapped to an AMC instance and deployed as a stack in the solution.
Workflow Manager
The Workflow Manager manages requests sent to the AMC API. In addition to synchronizing data between the solution and a customer's AMC instance, the Workflow Manager enables scheduling of AMC workflows using CRON-based scheduling, and queue-based routing to ensure that all requests are processed.
Amazon Ads Reporting
The Amazon Ads Reporting microservice schedules and fetches reports from the Amazon Ads reporting API endpoint.
Selling Partner Reporting
The Selling Partner Reporting microservice schedules and fetches reports from the Selling Partner API.
Serverless Data lake
The Data Lake transforms the data delivered by the other microservices in any of the intake S3 buckets deployed by the application (reporting bucket for Amazon Ads and Selling Partner reports, AMC buckets for AMC data, and the general-purpose Raw bucket for custom data uploaded by an external provider or AWS service). The data lake detects the objects created in the bucket and starts the transformations if the dataset is configured. The data lake routes the data to its corresponding pipeline and applies custom transformation for the dataset provided by customers. The transformed data is stored to the Amazon S3 stage buckets and can be accessed through AWS Glue Data Catalog.
Orchestration
AWS Step Functions
-
The Step Functions in the Tenant Provisioning Service orchestrate Lambda functions to add AMC instances, and register the provisioned customer into the data lake.
-
The Workflow Manager uses Step Functions to coordinate Lambda functions for processing workflow requests, creating workflow runs, checking workflow status, and notifying the user.
-
Step Functions in the data lake automates transformations after data are delivered in any of the intake S3 buckets.
-
The Amazon Ads Reporting and Selling Partner Reporting Step Functions orchestrate the Lambda functions to schedule and handle report requests, check the status of reports, and download the completed reports to the S3 bucket.