Managing a data lake using Lake Formation tag-based access control - AWS Lake Formation

Managing a data lake using Lake Formation tag-based access control

Thousands of customers are building petabyte-scale data lakes on AWS. Many of these customers use AWS Lake Formation to easily build and share their data lakes across the organization. As the number of tables and users increase, data stewards and administrators are looking for ways to manage permissions on data lakes easily at scale. Lake Formation Tag-based access control (LF-TBAC) solves this problem by allowing data stewards to create LF-tags (based on their data classification and ontology) that can then be attached to resources.

LF-TBAC is an authorization strategy that defines permissions based on attributes. In Lake Formation, these attributes are called LF-tags. You can attach LF-tags to Data Catalog resources and Lake Formation principals. Data lake administrators can assign and revoke permissions on Lake Formation resources using LF-tags. For more information about see, Lake Formation tag-based access control.

This tutorial demonstrates how to create a Lake Formation tag-based access control policy using an AWS public dataset. In addition, it shows how to query tables, databases, and columns that have Lake Formation tag-based access policies associated with them.

You can use LF-TBAC for the following use cases:

  • You have a large number of tables and principals that the data lake administrator has to grant access

  • You want to classify your data based on an ontology and grant permissions based on classification

  • The data lake administrator wants to assign permissions dynamically, in a loosely coupled way

Following are the high-level steps for configuring permissions using LF-TBAC:

  1. The data steward defines the tag ontology with two LF-tags: Confidential and Sensitive. Data with Confidential=True has tighter access controls. Data with Sensitive=True requires specific analysis from the analyst.

  2. The data steward assigns different permission levels to the data engineer to build tables with different LF-tags.

  3. The data engineer builds two databases: tag_database and col_tag_database. All tables in tag_database are configured with Confidential=True. All tables in the col_tag_database are configured with Confidential=False. Some columns of the table in col_tag_database are tagged with Sensitive=True for specific analysis needs.

  4. The data engineer grants read permission to the analyst for tables with specific expression condition Confidential=True and Confidential=False,Sensitive=True.

  5. With this configuration, the data analyst can focus on performing analysis with the right data.