Overview of Lake Formation Tag-Based Access Control - AWS Lake Formation

Overview of Lake Formation Tag-Based Access Control

Lake Formation tag-based access control (LF-TBAC) works with IAM's attribute-based access control (ABAC) to provide fine-grained access to your data lake resources and data.

Note

IAM tags are not the same as LF-tags. These tags are not interchangeable. LF-tags are used to grant Lake Formation permissions and IAM tags are used to define IAM policies.

What is Lake Formation Tag-based Access Control?

Lake Formation tag-based access control (LF-TBAC) is an authorization strategy that defines permissions based on attributes. In Lake Formation, these attributes are called LF-tags. You can attach LF-tags to Data Catalog resources, Lake Formation principals, and table columns. You can assign and revoke permissions on Lake Formation resources using theseLF-tags. Lake Formation allows operations on those resources when the principal's tag matches the resource tag. LF-TBAC is helpful in environments that are growing rapidly and helps with situations where policy management becomes cumbersome.

Comparison of Lake Formation tag-based access control to IAM attribute-based access control

Attribute-based access control (ABAC) is an authorization strategy that defines permissions based on attributes. In AWS, these attributes are called tags. You can attach tags to IAM resources, including IAM entities (users or roles) and to AWS resources. You can create a single ABAC policy or small set of policies for your IAM principals. These ABAC policies can be designed to allow operations when the principal's tag matches the resource tag. ABAC is helpful in environments that are growing rapidly and helps with situations where policy management becomes cumbersome.

Cloud security and governance teams use IAM to define access policies and security permissions for all resources including Amazon S3 buckets, Amazon EC2 instances and any resources you can reference with an ARN. The IAM policies define broad (coarse-grained) permissions to your data lake resources, for example, to allow or deny access at Amazon S3 bucket or prefix level or database level. For more information about IAM ABAC, see What is ABAC for AWS? in the IAM User Guide.

For example, you can create three roles with the access-project tag key. Set the tag value of the first role to Heart, the second to Sun, and the third to Lightning. You can then use a single policy that allows access when the role and the resource are tagged with the same value for access-project.

Data governance teams use Lake Formation to define fine-grained permissions to specific data lake resources. LF-tags are assigned to Data Catalog resources (databases, tables, and columns) and are granted to principals. A principal with LF-tags that match the LF-tags of a resource can access that resource. Lake Formation permissions are secondary to IAM permissions. For example, if IAM permissions don't allow a user access to a data lake, Lake Formation doesn't grant access to any resource within that data lake to that user, even if the principal and resource have matching LF-tags.

Lake Formation tag-based access control (LF-TBAC) works with IAM ABAC to provide additional levels of permissions for your Lake Formation data and resources.

  • Lake Formation TBAC permissions scale with innovation. It's no longer necessary for an administrator to update existing policies to allow access to new resources. For example, assume that you use an IAM ABAC strategy with the access-project tag to provide access to specific databases within the Lake Formation LF-tag. Using LF-TBAC, the LF-tag RodeoDrive is assigned to specific tables or columns, and the same LF-tag is granted to a developer. Through IAM, the developer can access the database, and LF-TBAC permissions further restrict the developer access to specific tables or columns within tables. If a new table is added to the project, the Lake Formation administrator only needs to assign the tag to the new table for the developer to be given access to the table.

  • Lake Formation TBAC requires fewer IAM policies. Because you use IAM policies to grant high level access to Lake Formation resources and Lake Formation TBAC for managing more precise data access, you create fewer IAM policies. Those policies are also easier to manage.

  • Using Lake Formation TBAC, teams can change and grow quickly. This is because permissions for new resources are automatically granted based on attributes. For example, if a new developer joins the project, it's easy to grant this developer access by associating the IAM role to the user and then assigning the required LF-tags to the user. It's not necessary to change the policy to support a new project or to create new LF-tags.

  • Finer-grained permissions are possible using Lake Formation TBAC. IAM policies grant access to the top-level resources, such as Data Catalog databases or tables. Using Lake Formation TBAC, you can grant access to specific tables or only columns that contain specific data values.

Note

IAM tags are not the same as LF-tags. These tags are not interchangeable. LF-tags are used to grant Lake Formation permissions and IAM tags are used to define IAM policies.

How Lake Formation Tag-based Access Control Works

Each LF-tag is a key-value pair, such as department=sales or classification=restricted. A key can have multiple defined values, such as department=sales,marketing,engineering,finance.

To use the LF-TBAC method, data lake administrators and data engineers perform the following tasks.

Task Task Details

1. Define the properties and relationships of LF-tags.

-

2. Create the LF-tags in Lake Formation.

Creating LF-Tags

3. Assign LF-tags to Data Catalog resources.

Assigning LF-Tags to Data Catalog Resources

4. Grant permissions to other principals to assign LF-tags to resources, optionally with the grant option.

Granting, Revoking, and Listing LF-Tag Permissions

5. Grant LF-tag expressions to principals, optionally with the grant option.

Granting Data Catalog Permissions Using the LF-TBAC Method

6. (Recommended) After verifying that principals have access to the correct resources through the LF-TBAC method, revoke permissions that were granted by using the named resource method.

-

Consider the case where a data lake administrator must grant permissions to three principals on three databases and seven tables.


          Three figures of users are at the left, arranged vertically. At the right are
            three databases labeled A, B, and C, arranged vertically. Database A has two tables
            labeled A.1 and A.2, database B has tables labels B.1 and B.2, and Database C has three
            tables labeled C.1, C.2, and C.3. Seventeen arrows connect the users to the databases
            and tables, indicating grants on the databases and tables to the users.

To achieve the permissions indicated in the preceding diagram by using the named resource method, the data lake administrator would have to make 17 grants, as follows (in pseudo-code).

GRANT CREATE_TABLE ON Database A TO PRINCIPAL 1 GRANT SELECT, INSERT ON Table A.1 TO PRINCIPAL 1 GRANT SELECT, INSERT ON Table A.2 TO PRINCIPAL 1 GRANT SELECT, INSERT ON Table B.2 TO PRINCIPAL 1 ... GRANT SELECT, INSERT ON Table A.2 TO PRINCIPAL 2 GRANT CREATE_TABLE ON Database B TO PRINCIPAL 2 ... GRANT SELECT, INSERT ON Table C.3 TO PRINCIPAL 3

Now consider how the data lake administrator would grant permissions by using LF-TBAC. The following diagram indicates that the data lake administrator has assigned LF-tags to databases and tables, and has granted permissions on LF-tags to principals.

In this example, the LF-tags represent areas of the data lake that contain analytics for different modules of an enterprise resource planning (ERP) application suite. The data lake administrator wants to control access to the analytics data for the various modules. All LF-tags have the key module and possible values Sales, Orders, and Customers. An example LF-tag looks like this:

module=Sales

The diagram shows only the LF-tag values.


          Like the previous diagram, three figures of users are at the left, arranged
            vertically, and at the right are three databases labeled A, B, and C, arranged
            vertically. Database A has two tables labeled A.1 and A.2, database B has tables labels
            B.1 and B.2, and Database C has three tables labeled C.1, C.2, and C.3. There are no
            arrows between the users and the databases and tables. Instead, labeled "flags" next to
            the users indicate that user1 has been granted the LF-tags Sales and Customers, user 2
            has been granted the LF-tag Orders, and user 3 has been granted the LF-tag
            Customers. Flags next to the databases and tables indicate the following assignments of
            LF-tags to databases and tables: Database A: Sales. Table A1: A dimmed flag indicates
            that Sales was inherited from Database A. Table A2: Orders, but a dimmed flag indicates
            that Sales was inherited from Database A. Database B: Orders. Table B.1 and B.2 inherit
            Orders, and Table B.2 has Customers. Database C has Customers, and Tables C.1, C.2, and
            C.3 inherit Customers. The C tables don't have any other assignments.

Tag Assignments to Data Catalog Resources and Inheritance

Tables inherit LF-tags from databases and columns inherit LF-tags from tables. Inherited values can be overridden. In the preceding diagram, dimmed LF-tags are inherited.

Because of inheritance, the data lake administrator needs to make only the five following LF-tag assignments to resources (in pseudo-code).

ASSIGN TAGS module=Sales TO database A ASSIGN TAGS module=Orders TO table A.2 ASSIGN TAGS module=Orders TO database B ASSIGN TAGS module=Customers TO table B.2 ASSIGN TAGS module=Customers TO database C

Tag Grants to Principals

After assigning LF-tags to the databases and tables, the data lake administrator must make only four grants of LF-tags to principals, as follows (in pseudo-code).

GRANT TAGS module=Sales TO Principal 1 GRANT TAGS module=Customers TO Principal 1 GRANT TAGS module=Orders TO Principal 2 GRANT TAGS module=Customers TO Principal 3

Now, a principal with the module=Sales LF-tag can access Data Catalog resources with the module=Sales LF-tag (for example, database A), a principal with the module=Customers LF-tag can access resources with the module=Customers LF-tag, and so on.

The preceding grant commands are incomplete. This is because although they indicate through LF-tags the Data Catalog resources that the principals have permissions on, they don't indicate exactly which Lake Formation permissions (such as SELECT, ALTER) the principals have on those resources. Therefore, the following pseudo-code commands are a more accurate representation of how Lake Formation permissions are granted on Data Catalog resources through LF-tags.

GRANT (CREATE_TABLE ON DATABASES) ON TAGS module=Sales TO Principal 1 GRANT (SELECT, INSERT ON TABLES) ON TAGS module=Sales TO Principal 1 GRANT (CREATE_TABLE ON DATABASES) ON TAGS module=Customers TO Principal 1 GRANT (SELECT, INSERT ON TABLES) ON TAGS module=Customers TO Principal 1 GRANT (CREATE_TABLE ON DATABASES) ON TAGS module=Orders TO Principal 2 GRANT (SELECT, INSERT ON TABLES) ON TAGS module=Orders TO Principal 2 GRANT (CREATE_TABLE ON DATABASES) ON TAGS module=Customers TO Principal 3 GRANT (SELECT, INSERT ON TABLES) ON TAGS module=Customers TO Principal 3

Putting It Together - Resulting Principal Permissions on Resources

Given the LF-tags assigned to the databases and tables in the preceding diagram, and the LF-tags granted to the principals in the diagram, the following table lists the Lake Formation permissions that the principals have on the databases and tables.

Principal Permissions Granted Through LF-tags
Principal 1
  • CREATE_TABLE on database A

  • SELECT, INSERT on table A.1

  • SELECT, INSERT on table A.2

  • SELECT, INSERT on table B.2

  • CREATE_TABLE on database C

  • SELECT, INSERT on table C.1

  • SELECT, INSERT on table C.2

  • SELECT, INSERT on table C.3

Principal 2
  • SELECT, INSERT on table A.2

  • CREATE_TABLE on database B

  • SELECT, INSERT on table B.1

  • SELECT, INSERT on table B.2

Principal 3
  • SELECT, INSERT on table B.2

  • CREATE_TABLE on database C

  • SELECT, INSERT on table C.1

  • SELECT, INSERT on table C.2

  • SELECT, INSERT on table C.3

Bottom Line

In this simple example, using five assignment operations and eight grant operations, the data lake administrator was able to specify 17 permissions. When there are tens of databases and hundreds of tables, the advantage of the LF-TBAC method over the named resource method becomes clear. In the hypothetical case of the need to grant every principal access to every resource, and where n(P) is the number of principals and n(R) is the number of resources:

  • With the named resource method, the number of grants required is n(P)n(R).

  • With the LF-TBAC method, using a single LF-tag, the total of the number of grants to principals and assignments to resources is n(P) + n(R).