Step 3: Create Lake Formation databases - AWS Lake Formation

Step 3: Create Lake Formation databases

In this step, you create two databases and attach LF-Tags to the databases and specific columns for testing purposes.

Create your databases and table for database-level access
  1. First, create the database tag_database, the table source_data, and attach appropriate LF-Tags.

    1. On the Lake Formation console (https://console.aws.amazon.com/lakeformation/), under Data Catalog, choose Databases.

    2. Choose Create database.

    3. For Name, enter tag_database.

    4. For Location, enter the Amazon S3 location created by the AWS CloudFormation template (s3://lf-tagbased-demo-Account-ID/tag_database/).

    5. Deselect Use only IAM access control for new tables in this database.

    6. Choose Create database.

  2. Next, create a new table within tag_database.

    1. On the Databases page, select the database tag_database.

    2. ChooseView Tables and click Create table.

    3. For Name, enter source_data.

    4. For Database, choose the database tag_database.

    5. For Table format, choose Standard AWS Glue table.

    6. For Data is located in, select Specified path in my account.

    7. For Include path, enter the path to tag_database created by the AWS CloudFormation template (s3://lf-tagbased-demoAccount-ID/tag_database/).

    8. For Data format, select CSV.

    9. Under Upload schema, enter the following JSON array of column structure to create a schema:

      [ { "Name": "vendorid", "Type": "string" }, { "Name": "lpep_pickup_datetime", "Type": "string" }, { "Name": "lpep_dropoff_datetime", "Type": "string" }, { "Name": "store_and_fwd_flag", "Type": "string" }, { "Name": "ratecodeid", "Type": "string" }, { "Name": "pulocationid", "Type": "string" }, { "Name": "dolocationid", "Type": "string" }, { "Name": "passenger_count", "Type": "string" }, { "Name": "trip_distance", "Type": "string" }, { "Name": "fare_amount", "Type": "string" }, { "Name": "extra", "Type": "string" }, { "Name": "mta_tax", "Type": "string" }, { "Name": "tip_amount", "Type": "string" }, { "Name": "tolls_amount", "Type": "string" }, { "Name": "ehail_fee", "Type": "string" }, { "Name": "improvement_surcharge", "Type": "string" }, { "Name": "total_amount", "Type": "string" }, { "Name": "payment_type", "Type": "string" } ]
    10. Choose Upload. After uploading the schema, the table schema should look like the following screenshot:

    11. Choose Submit.

  3. Next, attach LF-Tags at the database level.

    1. On the Databases page, find and select tag_database.

    2. On the Actions menu, choose Edit LF-Tags.

    3. Choose Assign new LF-tag.

    4. For Assigned keysΒΈ choose the Confidential LF-Tag you created earlier.

    5. For Values, choose True.

    6. Choose Save.

    This completes the LF-Tag assignment to the tag_database database.

Create your database and table for column-level access

Repeat the following steps to create the database col_tag_database and table source_data_col_lvl, and attach LF-Tags at the column level.

  1. On the Databases page, choose Create database.

  2. For Name, enter col_tag_database.

  3. For Location, enter the Amazon S3 location created by the AWS CloudFormation template (s3://lf-tagbased-demo-Account-ID/col_tag_database/).

  4. Deselect Use only IAM access control for new tables in this database.

  5. Choose Create database.

  6. On the Databases page, select your new database (col_tag_database).

  7. Choose View tables and click Create table.

  8. For Name, enter source_data_col_lvl.

  9. For Database, choose your new database (col_tag_database).

  10. For Table format, choose Standard AWS Glue table.

  11. For Data is located in, select Specified path in my account.

  12. Enter the Amazon S3 path for col_tag_database (s3://lf-tagbased-demo-Account-ID/col_tag_database/).

  13. For Data format, select CSV.

  14. Under Upload schema, enter the following schema JSON:

    [ { "Name": "vendorid", "Type": "string" }, { "Name": "lpep_pickup_datetime", "Type": "string" }, { "Name": "lpep_dropoff_datetime", "Type": "string" }, { "Name": "store_and_fwd_flag", "Type": "string" }, { "Name": "ratecodeid", "Type": "string" }, { "Name": "pulocationid", "Type": "string" }, { "Name": "dolocationid", "Type": "string" }, { "Name": "passenger_count", "Type": "string" }, { "Name": "trip_distance", "Type": "string" }, { "Name": "fare_amount", "Type": "string" }, { "Name": "extra", "Type": "string" }, { "Name": "mta_tax", "Type": "string" }, { "Name": "tip_amount", "Type": "string" }, { "Name": "tolls_amount", "Type": "string" }, { "Name": "ehail_fee", "Type": "string" }, { "Name": "improvement_surcharge", "Type": "string" }, { "Name": "total_amount", "Type": "string" }, { "Name": "payment_type", "Type": "string" } ]
  15. Choose Upload. After uploading the schema, the table schema should look like the following screenshot.

  16. Choose Submit to complete the creation of the table.

  17. Now, associate the Sensitive=True LF-Tag to the columns vendorid and fare_amount.

    1. On the Tables page, select the table you created (source_data_col_lvl).

    2. On the Actions menu, choose Schema.

    3. Select the column vendorid and choose Edit LF-Tags.

    4. For Assigned keys, choose Sensitive.

    5. For Values, choose True.

    6. Choose Save.

  18. Next, associate the Confidential=False LF-Tag to col_tag_database. This is required for lf-data-analyst to be able to describe the database col_tag_database when logged in from Amazon Athena.

    1. On the Databases page, find and select col_tag_database.

    2. On the Actions menu, choose Edit LF-Tags.

    3. Choose Assign new LF-Tag.

    4. For Assigned keys, choose the Confidential LF-Tag you created earlier.

    5. For Values, choose False.

    6. Choose Save.