Security and Access Control to Metadata and Data in Lake Formation - AWS Lake Formation

Security and Access Control to Metadata and Data in Lake Formation

AWS Lake Formation provides a permissions model that is based on a simple grant/revoke mechanism. Lake Formation permissions combine with AWS Identity and Access Management (IAM) permissions to control access to data stored in data lakes and to the metadata that describes that data.

Before you learn about the details of the Lake Formation permissions model, it is helpful to review the following background information:

  • Data lakes managed by Lake Formation reside in designated locations in Amazon Simple Storage Service (Amazon S3).

  • Lake Formation maintains a Data Catalog that contains metadata about source data to be imported into your data lakes, such as data in logs and relational databases, and about data in your data lakes in Amazon S3. The metadata is organized as databases and tables. Metadata tables contain schema, location, partitioning, and other information about the data that they represent. Metadata databases are collections of tables.

  • The Lake Formation Data Catalog is the same Data Catalog used by AWS Glue. You can use AWS Glue crawlers to create Data Catalog tables, and you can use AWS Glue extract, transform, and load (ETL) jobs to populate the underlying data in your data lakes.

  • The databases and tables in the Data Catalog are referred to as Data Catalog resources. Tables in the Data Catalog are referred to as metadata tables to distinguish them from tables in data sources or tabular data in Amazon S3. The data that the metadata tables point to in Amazon S3 or in data sources is referred to as underlying data.

  • AWS Glue crawlers create metadata tables, but you can also manually create metadata tables with the Lake Formation console, the API, or the AWS Command Line Interface (AWS CLI). When creating a metadata table, you must specify a location. When you create a database, the location is optional. Table locations can be Amazon S3 locations or data source locations such as an Amazon Relational Database Service (Amazon RDS) database. Database locations are always Amazon S3 locations.

  • Services that integrate with Lake Formation, such as Amazon Athena and Amazon Redshift, can access the Data Catalog to obtain metadata and to check authorization for running queries. For a complete list of integrated services, see AWS Service Integrations with Lake Formation.