Creating a data lake from an AWS CloudTrail source
This tutorial guides you through the actions to take on the Lake Formation console to create and load your first data lake from an AWS CloudTrail source.
High-level steps for creating a data lake
Register an Amazon Simple Storage Service (Amazon S3) path as a data lake.
Grant Lake Formation permissions to write to the Data Catalog and to Amazon S3 locations in the data lake.
Create a database to organize the metadata tables in the Data Catalog.
Use a blueprint to create a workflow. Run the workflow to ingest data from a data source.
-
Set up your Lake Formation permissions to allow others to manage data in the Data Catalog and the data lake.
Set up Amazon Athena to query the data that you imported into your Amazon S3 data lake.
For some data store types, set up Amazon Redshift Spectrum to query the data that you imported into your Amazon S3 data lake.
Topics
- Intended audience
- Prerequisites
- Step 1: Create a data analyst user
- Step 2: Add permissions to read AWS CloudTrail logs to the workflow role
- Step 3: Create an Amazon S3 bucket for the data lake
- Step 4: Register an Amazon S3 path
- Step 5: Grant data location permissions
- Step 6: Create a database in the Data Catalog
- Step 7: Grant data permissions
- Step 8: Use a blueprint to create a workflow
- Step 9: Run the workflow
- Step 10: Grant SELECT on the tables
- Step 11: Query the data lake Using Amazon Athena