Managing a data lake using Lake Formation tag-based access control
Thousands of customers are building petabyte-scale data lakes on AWS. Many of these customers use AWS Lake Formation to easily build and share their data lakes across the organization. As the number of tables and users increase, data stewards and administrators are looking for ways to manage permissions on data lakes easily at scale. Lake Formation Tag-based access control (LF-TBAC) solves this problem by allowing data stewards to create LF-tags (based on their data classification and ontology) that can then be attached to resources.
LF-TBAC is an authorization strategy that defines permissions based on attributes. In Lake Formation, these attributes are called LF-tags. You can attach LF-tags to Data Catalog resources and Lake Formation principals. Data lake administrators can assign and revoke permissions on Lake Formation resources using LF-tags. For more information about see, Lake Formation tag-based access control.
This tutorial demonstrates how to create a Lake Formation tag-based access control policy using an AWS public dataset. In addition, it shows how to query tables, databases, and columns that have Lake Formation tag-based access policies associated with them.
You can use LF-TBAC for the following use cases:
You have a large number of tables and principals that the data lake administrator has to grant access
You want to classify your data based on an ontology and grant permissions based on classification
The data lake administrator wants to assign permissions dynamically, in a loosely coupled way
Following are the high-level steps for configuring permissions using LF-TBAC:
-
The data steward defines the tag ontology with two LF-tags:
Confidential
andSensitive
. Data withConfidential=True
has tighter access controls. Data withSensitive=True
requires specific analysis from the analyst. -
The data steward assigns different permission levels to the data engineer to build tables with different LF-tags.
-
The data engineer builds two databases:
tag_database
andcol_tag_database
. All tables intag_database
are configured withConfidential=True
. All tables in thecol_tag_database
are configured withConfidential=False
. Some columns of the table incol_tag_database
are tagged withSensitive=True
for specific analysis needs. The data engineer grants read permission to the analyst for tables with specific expression condition
Confidential=True
andConfidential=False
,Sensitive=True
.-
With this configuration, the data analyst can focus on performing analysis with the right data.
Topics
- Intended audience
- Prerequisites
- Step 1: Provision your resources
- Step 2: Register your data location, create an LF-tag ontology, and grant permissions
- Step 3: Create Lake Formation databases
- Step 4: Grant table permissions
- Step 5: Run a query in Amazon Athena to verify the permissions
- Step 6: Clean up AWS resources