Lake Formation Tag-based access control - AWS Lake Formation

Lake Formation Tag-based access control

Lake Formation tag-based access control (LF-TBAC) is the recommended method to use to grant Lake Formation permissions when there is a large number of Data Catalog resources. LF-TBAC is more scalable than the named resource method and requires less permission management overhead.

How Lake Formation tag-based access control works

Each LF-tag is a key-value pair, such as department=sales or classification=restricted. A key can have multiple defined values, such as department=sales,marketing,engineering,finance.

To use the LF-TBAC method, data lake administrators and data engineers perform the following tasks.

Task Task Details

1. Define the properties and relationships of LF-tags.

-

2. Create the LF-tags in Lake Formation.

Creating LF-Tags

3. Assign LF-tags to Data Catalog resources.

Assigning LF-Tags to Data Catalog resources

4. Grant permissions to other principals to assign LF-tags to resources, optionally with the grant option.

Granting, Revoking, and Listing LF-Tag Permissions

5. Grant LF-tag expressions to principals, optionally with the grant option.

Granting Data Catalog permissions using the LF-TBAC method

6. (Recommended) After verifying that principals have access to the correct resources through the LF-TBAC method, revoke permissions that were granted by using the named resource method.

-

Consider the case where a data lake administrator must grant permissions to three principals on three databases and seven tables.


        Three figures of users are at the left, arranged vertically. At the right are three
          databases labeled A, B, and C, arranged vertically. Database A has two tables labeled A.1
          and A.2, database B has tables labels B.1 and B.2, and Database C has three tables labeled
          C.1, C.2, and C.3. Seventeen arrows connect the users to the databases and tables,
          indicating grants on the databases and tables to the users.

To achieve the permissions indicated in the preceding diagram by using the named resource method, the data lake administrator would have to make 17 grants, as follows (in pseudo-code).

GRANT CREATE_TABLE ON Database A TO PRINCIPAL 1 GRANT SELECT, INSERT ON Table A.1 TO PRINCIPAL 1 GRANT SELECT, INSERT ON Table A.2 TO PRINCIPAL 1 GRANT SELECT, INSERT ON Table B.2 TO PRINCIPAL 1 ... GRANT SELECT, INSERT ON Table A.2 TO PRINCIPAL 2 GRANT CREATE_TABLE ON Database B TO PRINCIPAL 2 ... GRANT SELECT, INSERT ON Table C.3 TO PRINCIPAL 3

Now consider how the data lake administrator would grant permissions by using LF-TBAC. The following diagram indicates that the data lake administrator has assigned LF-tags to databases and tables, and has granted permissions on LF-tags to principals.

In this example, the LF-tags represent areas of the data lake that contain analytics for different modules of an enterprise resource planning (ERP) application suite. The data lake administrator wants to control access to the analytics data for the various modules. All LF-tags have the key module and possible values Sales, Orders, and Customers. An example LF-tag looks like this:

module=Sales

The diagram shows only the LF-tag values.


        Like the previous diagram, three figures of users are at the left, arranged
          vertically, and at the right are three databases labeled A, B, and C, arranged vertically.
          Database A has two tables labeled A.1 and A.2, database B has tables labels B.1 and B.2,
          and Database C has three tables labeled C.1, C.2, and C.3. There are no arrows between the
          users and the databases and tables. Instead, labeled "flags" next to the users indicate
          that user1 has been granted the LF-tags Sales and Customers, user 2 has been granted the
          LF-tag Orders, and user 3 has been granted the LF-tag Customers. Flags next to the
          databases and tables indicate the following assignments of LF-tags to databases and
          tables: Database A: Sales. Table A1: A dimmed flag indicates that Sales was inherited from
          Database A. Table A2: Orders, but a dimmed flag indicates that Sales was inherited from
          Database A. Database B: Orders. Table B.1 and B.2 inherit Orders, and Table B.2 has
          Customers. Database C has Customers, and Tables C.1, C.2, and C.3 inherit Customers. The C
          tables don't have any other assignments.

Tag Assignments to Data Catalog Resources and Inheritance

Tables inherit LF-tags from databases and columns inherit LF-tags from tables. Inherited values can be overridden. In the preceding diagram, dimmed LF-tags are inherited.

Because of inheritance, the data lake administrator needs to make only the five following LF-tag assignments to resources (in pseudo-code).

ASSIGN TAGS module=Sales TO database A ASSIGN TAGS module=Orders TO table A.2 ASSIGN TAGS module=Orders TO database B ASSIGN TAGS module=Customers TO table B.2 ASSIGN TAGS module=Customers TO database C

Tag Grants to Principals

After assigning LF-tags to the databases and tables, the data lake administrator must make only four grants of LF-tags to principals, as follows (in pseudo-code).

GRANT TAGS module=Sales TO Principal 1 GRANT TAGS module=Customers TO Principal 1 GRANT TAGS module=Orders TO Principal 2 GRANT TAGS module=Customers TO Principal 3

Now, a principal with the module=Sales LF-tag can access Data Catalog resources with the module=Sales LF-tag (for example, database A), a principal with the module=Customers LF-tag can access resources with the module=Customers LF-tag, and so on.

The preceding grant commands are incomplete. This is because although they indicate through LF-tags the Data Catalog resources that the principals have permissions on, they don't indicate exactly which Lake Formation permissions (such as SELECT, ALTER) the principals have on those resources. Therefore, the following pseudo-code commands are a more accurate representation of how Lake Formation permissions are granted on Data Catalog resources through LF-tags.

GRANT (CREATE_TABLE ON DATABASES) ON TAGS module=Sales TO Principal 1 GRANT (SELECT, INSERT ON TABLES) ON TAGS module=Sales TO Principal 1 GRANT (CREATE_TABLE ON DATABASES) ON TAGS module=Customers TO Principal 1 GRANT (SELECT, INSERT ON TABLES) ON TAGS module=Customers TO Principal 1 GRANT (CREATE_TABLE ON DATABASES) ON TAGS module=Orders TO Principal 2 GRANT (SELECT, INSERT ON TABLES) ON TAGS module=Orders TO Principal 2 GRANT (CREATE_TABLE ON DATABASES) ON TAGS module=Customers TO Principal 3 GRANT (SELECT, INSERT ON TABLES) ON TAGS module=Customers TO Principal 3

Putting It Together - Resulting Principal Permissions on Resources

Given the LF-tags assigned to the databases and tables in the preceding diagram, and the LF-tags granted to the principals in the diagram, the following table lists the Lake Formation permissions that the principals have on the databases and tables.

Principal Permissions Granted Through LF-tags
Principal 1
  • CREATE_TABLE on database A

  • SELECT, INSERT on table A.1

  • SELECT, INSERT on table A.2

  • SELECT, INSERT on table B.2

  • CREATE_TABLE on database C

  • SELECT, INSERT on table C.1

  • SELECT, INSERT on table C.2

  • SELECT, INSERT on table C.3

Principal 2
  • SELECT, INSERT on table A.2

  • CREATE_TABLE on database B

  • SELECT, INSERT on table B.1

  • SELECT, INSERT on table B.2

Principal 3
  • SELECT, INSERT on table B.2

  • CREATE_TABLE on database C

  • SELECT, INSERT on table C.1

  • SELECT, INSERT on table C.2

  • SELECT, INSERT on table C.3

Bottom Line

In this simple example, using five assignment operations and eight grant operations, the data lake administrator was able to specify 17 permissions. When there are tens of databases and hundreds of tables, the advantage of the LF-TBAC method over the named resource method becomes clear. In the hypothetical case of the need to grant every principal access to every resource, and where n(P) is the number of principals and n(R) is the number of resources:

  • With the named resource method, the number of grants required is n(P)n(R).

  • With the LF-TBAC method, using a single LF-tag, the total of the number of grants to principals and assignments to resources is n(P) + n(R).