Data Lake Solution
Data Lake Solution

Architecture Overview

Deploying this solution with the default parameters builds the following environment in the AWS Cloud.

        Data lake solution - architectural overview

Figure 1: Data lake solution architecture on AWS

The solution uses AWS CloudFormation to deploy the infrastructure components supporting this data lake reference implementation. At its core, this solution implements a data lake API, which leverages Amazon API Gateway to provide access to data lake microservices, (AWS Lambda functions). These microservices provide the business logic to create data packages, upload data, search for existing packages, add interesting data to a cart, generate data manifests, and perform administrative functions. These microservices interact with Amazon S3, AWS Glue, Amazon Athena, Amazon DynamoDB, Amazon ES, and Amazon CloudWatch Logs to provide data storage, management, and audit functions.

The solution creates a console and deploys it into an Amazon S3 bucket configured for static website hosting. During initial configuration, the solution also creates a default Administrator role and sends an access invite to a customer-specified user email. The solution uses an Amazon Cognito user pool to manage user access to the console and the data lake API. See Appendix A for detailed information on each of the solutions components.