Setting Up AWS Lake Formation - AWS Lake Formation

Setting Up AWS Lake Formation

Complete the following tasks to get set up to use Lake Formation:

Sign Up for AWS

When you sign up for AWS, your AWS account is automatically signed up for all services in AWS, including Lake Formation. You are charged only for the services that you use.

If you have an AWS account already, skip to the next task. If you don't have an AWS account, use the following procedure to create one.

To create an AWS account

  1. Open https://portal.aws.amazon.com/billing/signup.

  2. Follow the online instructions.

    Part of the sign-up procedure involves receiving a phone call and entering a verification code on the phone keypad.

Note your AWS account number, because you'll need it for the next task.

Create an Administrator IAM User

Services in AWS, such as Lake Formation, require that you provide credentials when you access them, so that the service can determine whether you have permission to access its resources. We don't recommend that you access AWS using the credentials for your AWS account. Instead, we recommend that you use AWS Identity and Access Management (IAM). You can create an IAM user, and then add the user to an IAM group with administrative permissions, or grant this user administrative permissions. You can then access AWS using the credentials for the IAM user.

If you signed up for AWS but have not created an administrative IAM user for yourself, you can create one using the IAM console. If you aren't familiar with using the console, see Working with the AWS Management Console for an overview.

To create an administrator user for yourself and add the user to an administrators group (console)

  1. Sign in to the IAM console as the account owner by choosing Root user and entering your AWS account email address. On the next page, enter your password.

    Note

    We strongly recommend that you adhere to the best practice of using the Administrator IAM user below and securely lock away the root user credentials. Sign in as the root user only to perform a few account and service management tasks.

  2. In the navigation pane, choose Users and then choose Add user.

  3. For User name, enter Administrator.

  4. Select the check box next to AWS Management Console access. Then select Custom password, and then enter your new password in the text box.

  5. (Optional) By default, AWS requires the new user to create a new password when first signing in. You can clear the check box next to User must create a new password at next sign-in to allow the new user to reset their password after they sign in.

  6. Choose Next: Permissions.

  7. Under Set permissions, choose Add user to group.

  8. Choose Create group.

  9. In the Create group dialog box, for Group name enter Administrators.

  10. Choose Filter policies, and then select AWS managed -job function to filter the table contents.

  11. In the policy list, select the check box for AdministratorAccess. Then choose Create group.

    Note

    You must activate IAM user and role access to Billing before you can use the AdministratorAccess permissions to access the AWS Billing and Cost Management console. To do this, follow the instructions in step 1 of the tutorial about delegating access to the billing console.

  12. Back in the list of groups, select the check box for your new group. Choose Refresh if necessary to see the group in the list.

  13. Choose Next: Tags.

  14. (Optional) Add metadata to the user by attaching tags as key-value pairs. For more information about using tags in IAM, see Tagging IAM Entities in the IAM User Guide.

  15. Choose Next: Review to see the list of group memberships to be added to the new user. When you are ready to proceed, choose Create user.

You can use this same process to create more groups and users and to give your users access to your AWS account resources. To learn about using policies that restrict user permissions to specific AWS resources, see Access Management and Example Policies.

Create an IAM Role for Workflows

With AWS Lake Formation, you can import your data using workflows. A workflow defines the data source and schedule to import data into your data lake. You can easily define workflows using the blueprints, or templates, that Lake Formation provides.

When you create a workflow, you must assign it an AWS Identity and Access Management (IAM) role that grants Lake Formation the necessary permissions to ingest the data.

The following procedure assumes familiarity with IAM.

To create an IAM role for workflows

  1. Open the IAM console at https://console.aws.amazon.com/iam and sign in as the IAM administrator user that you created in Create an Administrator IAM User or as an IAM user with the AdministratorAccess AWS managed policy.

  2. In the navigation pane, choose Roles, then Create role.

  3. On the Create role page, choose AWS service, and then choose Glue. Choose Next:Permissions.

  4. Search for the AWSGlueServiceRole managed policy, and select the check box next to the policy name in the list. Then complete the Create role wizard, naming the role LakeFormationWorkflowRole. To finish, choose Create role.

  5. Back on the Roles page, search for LakeFormationWorkflowRole and choose the role name.

  6. On the role Summary page, under the Permissions tab, choose Add inline policy, and add the following inline policy. A suggested name for the policy is LakeFormationWorkflow.

    Important

    In the following policy, replace <account-id> with a valid AWS account number.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "lakeformation:GetDataAccess", "lakeformation:GrantPermissions" ], "Resource": "*" }, { "Effect": "Allow", "Action": ["iam:PassRole"], "Resource": [ "arn:aws:iam::<account-id>:role/LakeFormationWorkflowRole" ] } ] }

    The following are brief descriptions of the permissions in this policy:

    • lakeformation:GetDataAccess enables jobs created by the workflow to write to the target location.

    • lakeformation:GrantPermissions enables the workflow to grant the SELECT permission on target tables.

    • iam:PassRole enables the service to assume the role LakeFormationWorkflowRole to create crawlers and jobs, and to attach the role to the created crawlers and jobs.

  7. Verify that the role LakeFormationWorkflowRole has two policies attached.

  8. If you are ingesting data that is outside the data lake location, add an inline policy granting permissions to read the source data.

Create a Data Lake Administrator

Data lake administrators are initially the only AWS Identity and Access Management (IAM) users or roles that can grant Lake Formation permissions on data locations and Data Catalog resources to any principal (including self). For more information about data lake administrator capabilities, see Implicit Lake Formation Permissions.

You can create a data lake administrator using the Lake Formation console or the PutDataLakeSettings operation of the Lake Formation API.

The following permissions are required to create a data lake administrator. The Administrator IAM user has these permissions implicitly.

  • lakeformation:PutDataLakeSettings

  • lakeformation:GetDataLakeSettings

To create a data lake administrator (console)

  1. If the IAM user who is to be a data lake administrator does not yet exist, use the IAM console to create it. Otherwise, view the existing IAM user who is to be the data lake administrator.

    Note

    We recommend that you do not select an IAM administrative user (user with the AdministratorAccess AWS managed policy) to be the data lake administrator.

    Attach the following managed policies to the user:

    Policies Mandatory? Notes
    AWSLakeFormationDataAdmin Mandatory Basic data lake administrator permissions.
    AWSGlueConsoleFullAccess, CloudWatchLogsReadOnlyAccess Optional Attach these policies if the data lake administrator will be troubleshooting workflows created from Lake Formation blueprints. These policies enable the data lake administrator to view troubleshooting information in the AWS Glue console and the Amazon CloudWatch Logs console. For information about workflows, see Importing Data Using Workflows in Lake Formation.
    AmazonAthenaFullAccess Optional Attach this policy if the data lake administrator will be running queries in Amazon Athena.
  2. Attach the following inline policy, which grants the data lake administrator permission to create the Lake Formation service-linked role. A suggested name for the policy is LakeFormationSLR.

    The service-linked role enables the data lake administrator to more easily register Amazon S3 locations with Lake Formation. For more information about the Lake Formation service-linked role, see Using Service-Linked Roles for Lake Formation.

    Important

    In all the following policy, replace <account-id> with a valid AWS account number.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:CreateServiceLinkedRole", "Resource": "*", "Condition": { "StringEquals": { "iam:AWSServiceName": "lakeformation.amazonaws.com" } } }, { "Effect": "Allow", "Action": [ "iam:PutRolePolicy" ], "Resource": "arn:aws:iam::<account-id>:role/aws-service-role/lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess" } ] }
  3. Attach the following PassRole inline policy to the user. This policy enables the data lake administrator to create and run workflows. The iam:PassRole permission enables the workflow to assume the role LakeFormationWorkflowRole to create crawlers and jobs, and to attach the role to the created crawlers and jobs. A suggested name for the policy is UserPassRole.

    Important

    Replace <account-id> with a valid AWS account number.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "PassRolePermissions", "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": [ "arn:aws:iam::<account-id>:role/LakeFormationWorkflowRole" ] } ] }
  4. Attach the following inline policy to the user. This policy enables the data lake administrator to grant and revoke cross-account permissions on Data Catalog resources, and to view and accept AWS Resource Access Manager (AWS RAM) resource share invitations. For more information, see Cross-Account Access.

    A suggested name for the policy is GrantCrossAccount.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ram:CreateResourceShare" ], "Resource": "*", "Condition": { "ForAllValues:StringEquals": { "ram:RequestedResourceType": [ "glue:Table", "glue:Database", "glue:Catalog" ] } } }, { "Effect": "Allow", "Action": [ "ram:UpdateResourceShare", "ram:DeleteResourceShare" ], "Resource": "*", "Condition": { "ForAllValues:StringLike": { "ram:ResourceShareName": [ "LakeFormation*" ] } } }, { "Effect": "Allow", "Action": [ "glue:PutResourcePolicy", "organizations:DescribeOrganization", "organizations:DescribeAccount", "ram:Get*", "ram:List*" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "organizations:ListRoots", "organizations:ListAccountsForParent", "organizations:ListOrganizationalUnitsForParent" ], "Resource": "*" } ] }
  5. Attach this additional inline policy for granting cross-account permissions. This policy enables the data lake administrator to view and accept AWS RAM resource share invitations by using the AWS RAM console. For more information, see Cross-Account Access.

    A suggested name for the policy is RAMConsole.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "ec2:DescribeAvailabilityZones", "Resource": "*" } ] }
  6. Open the AWS Lake Formation console at https://console.aws.amazon.com/lakeformation/ and sign in as the IAM Administrator user that you created in Create an Administrator IAM User or as any IAM administrative user.

  7. Do one of the following:

    • If a welcome message appears, choose Add administrators.

    • In the navigation pane, under Permissions, choose Admins and database creators. Then under Data lake administrators, choose Grant.

  8. In the Manage data lake administrators dialog box, for IAM users and roles, choose the IAM user that you created or selected in Step 1, and then choose Save.

Change Data Catalog Settings

Lake Formation starts with the "Use only IAM access control" settings enabled for compatibility with existing AWS Glue Data Catalog behavior. We recommend that you disable these settings to enable fine-grained access control with Lake Formation permissions.

For more information, see Changing the Default Security Settings for Your Data Lake.

Important

If you have existing AWS Glue Data Catalog databases and tables, do not follow the instructions in this section. Instead, follow the instructions in Upgrading AWS Glue Data Permissions to the AWS Lake Formation Model.

Warning

If you have automation in place that creates databases and tables in the Data Catalog, the following steps might cause the automation and downstream extract, transform, and load (ETL) jobs to fail. Proceed only after you have either modified your existing processes or granted explicit Lake Formation permissions to the required principals. For information about Lake Formation permissions, see Lake Formation Permissions Reference.

To change the default Data Catalog settings

  1. Continue in the Lake Formation console at https://console.aws.amazon.com/lakeformation/. Ensure that you are signed in as the IAM administrator user that you created in Create an Administrator IAM User or as an IAM user with the AdministratorAccess AWS managed policy.

  2. In the navigation pane, under Data catalog, choose Settings.

  3. Clear both check boxes and choose Save.

    
              The Data catalog settings dialog box has the subtitle "Default permissions for
                newly created databases and tables," and has two check boxes, which are described in
                the text.
  4. Sign out of the Lake Formation console and sign back in as the data lake administrator.

  5. In the navigation pane, under Permissions, choose Admins and database creators.

  6. Under Database creators, select the IAMAllowedPrincipals group, and choose Revoke.

    The Revoke permissions dialog box appears, showing that IAMAllowedPrincipals has the Create database permission.

  7. Choose Revoke.

Grant Access to the Data Catalog Encryption Key

If the AWS Glue Data Catalog is encrypted, grant AWS Identity and Access Management (IAM) permissions on the AWS KMS key to any principals who must grant Lake Formation permissions on Data Catalog databases and tables.