Before you begin - Amazon EMR

Before you begin

Before you launch an Amazon EMR cluster with AWS Lake Formation, complete the following tasks:

  1. Set up a trust relationship between your identity provider and AWS to enable SAML 2.0-based federation, and create IAM roles for Lake Formation. For instructions, see Configure a trust relationship between your IdP and Lake Formation.

  2. Create a new Amazon EC2 instance profile. For instructions, see Create a customized Amazon EC2 instance profile.

  3. Configure Amazon EMR security features. For instructions, see Configure EMR security.

You should also complete the following AWS Lake Formation tasks covered in the AWS Lake Formation Developer Guide:

  1. Allow data filtering for data lakes on Amazon EMR by opting in. You may opt in before or after launching an Amazon EMR cluster with Lake Formation, but you must explicitly allow data filtering before Amazon EMR can access data in Amazon S3 locations registered with Lake Formation. For more information and instructions, see Allow data filtering on Amazon EMR in the Lake Formation Developer Guide.

  2. Create a user-defined service role for Lake Formation to register data locations that will be accessed by Amazon EMR. For instructions, see Requirements for roles used to register locations.

    Warning

    You must use a user-defined role and not the Lake Formation service-linked role when you register data locations. Lake Formation does not support using its service-linked role when you integrate with EMR.

  3. Set up and control user access to resources through Lake Formation policies in the AWS Lake Formation console. For more information, see Lake Formation permissions.