Menu
Amazon EMR
Management Guide

Configure a Cross-Realm Trust

When you set up a cross-realm trust, you allow principals (usually users) from a different Kerberos realm to authenticate to application components on the EMR cluster. The cluster-dedicated KDC establishes a trust relationship with another KDC using a cross-realm principal that exists in both KDCs. The principal name and the password match precisely.

A cross-realm trust requires that the KDCs can reach each other over the network and resolve each other's domain names. Steps for establishing a cross-realm trust relationship with a Microsoft AD domain controller running as an EC2 instance are provided below, along with an example network setup that provides the required connectivity and domain-name resolution.

Setting Up a Cross-Realm Trust with an AD Domain Controller

Step 1: Set Up the VPC and Subnet

Step 2: Launch and Install the AD Domain Controller

Step 3: Add User Accounts to the Domain for the EMR Cluster

Step 4: Configure an Incoming Trust on the AD Domain Controller

Step 5: Use a DHCP Option Set to Specify the AD Domain Controller as a VPC DNS Server

Step 6: Launch a Kerberized EMR Cluster

Step 7: Create HDFS Users and Set Permissions on the Cluster for AD User Accounts

Step 8: Use SSH to log in to the cluster

Step 1: Set Up the VPC and Subnet

The following steps demonstrate creating a VPC and subnet so that the cluster-dedicated KDC can reach the AD domain controller and resolve its domain name. In these steps, domain-name resolution is provided by referencing the AD domain controller as the domain name server in the DHCP option set. For more information, see Step 5: Use a DHCP Option Set to Specify the AD Domain Controller as a VPC DNS Server.

The KDC and the AD domain controller must be able to resolve each other's domain name. This allows Amazon EMR to join computers to the domain and automatically configure corresponding Linux user accounts and SSH parameters on cluster instances.

If Amazon EMR can't resolve the domain name, you can reference the trust using the AD domain controller's IP address. However, you must manually add Linux user accounts, add corresponding principals to the cluster-dedicated KDC, and configure SSH.

  1. Create an Amazon VPC with a single public subnet. For more information, see Step 1: Create the VPC in the Amazon VPC Getting Started Guide.

    Important

    When you use a Microsoft AD domain controller, choose a CIDR block for the EMR cluster so that all IPv4 addresses are fewer than nine characters in length (for example, 10.0.0.0/16). This is because the DNS names of cluster computers are used when the computers join the AD directory. AWS assigns DNS Hostnames based on IPv4 address in a way that longer IP addresses may result in DNS names longer than 15 characters. AD has a 15-character limit for registering joined computer names, and truncates longer names, which can cause unpredictable errors.

  2. Remove the default DHCP option set assigned to the VPC. For more information, see Changing a VPC to use No DHCP Options. Later on, you add a new one that specifies the AD domain controller as the DNS server.

  3. Confirm that DNS support is enabled for the VPC, that is, that DNS Hostnames and DNS Resolution are both enabled. They are enabled by default. For more information, see Updating DNS Support for Your VPC.

  4. Confirm that your VPC has an Internet gateway attached, which is the default. For more information, see Creating and Attaching an Internet Gateway.

    Note

    An Internet gateway is used in this example because you are establishing a new domain controller for the VPC. An Internet gateway may not be required for your application. The only requirement is that the cluster-dedicated KDC can access the AD domain controller.

  5. Create a custom route table, add a route that targets the Internet Gateway, and then attach it to your subnet. For more information, see Create a Custom Route Table.

  6. When you launch the EC2 instance for the domain controller, it must have a static public IPv4 address for you to connect to it using RDP. The easiest way to do this is to configure your subnet to auto-assign public IPv4 addresses. This is not the default setting when a subnet is created. For more information, see Modifying the Public IPv4 Addressing Attribute of your Subnet. Optionally, you can assign the address when you launch the instance. For more information, see Assigning a Public IPv4 Address During Instance Launch.

  7. When you finish, make a note of your VPC and subnet IDs. You use them later when you launch the AD domain controller and the cluster.

Step 2: Launch and Install the AD Domain Controller

  1. Launch an EC2 instance based on the Microsoft Windows Server 2016 Base AMI. We recommend an m4.xlarge or better instance type. For more information, see Launching an Instance in the Amazon EC2 User Guide for Linux Instances.

  2. Connect to the EC2 instance using RDP. For more information, see Connecting to Your Windows Instance in the Amazon EC2 User Guide for Linux Instances.

  3. Start Server Manager to install and configure the Active Directory Domain Services role on the server. Promote the sever to a domain controller and assign a domain name (the example we use here is ad.domain.com). Make a note of the domain name because you need it later when you create the EMR security configuration and cluster. If you are new to setting up AD, you can follow the instructions in How to Set Up Active Directory (AD) in Windows Server 2016.

    The instance restarts when you finish.

Step 3: Add User Accounts to the Domain for the EMR Cluster

RDP to the AD domain controller to create user accounts in Active Directory Users and Computers for each cluster user. For instructions, see Create a User Account in Active Directory Users and Computers. Make a note of each user's User logon name. You need these later when you configure the cluster.

Note

Mapping AD Groups to Hadoop requires additional configuration. See (Optional) Configure AD Group Mappings

In addition, create a user account with sufficient privileges to join computers to the domain. You specify this account when you create a cluster. Amazon EMR uses it to join cluster instances to the domain. You specify this account and its password in Step 6: Launch a Kerberized EMR Cluster. To delegate computer join privileges to the user account, we recommend that you create a group with join privileges and then assign the user to the group. For instructions, see Delegating Directory Join Privileges in the AWS Directory Service Administration Guide.

Step 4: Configure an Incoming Trust on the AD Domain Controller

The example commands below create a trust in AD, which is a one-way, incoming, non-transitive, realm trust with the cluster-dedicated KDC. The example we use for the cluster's realm is EC2.INTERNAL. The passwordt parameter specifies the cross-realm principal password, which you specify along with the cluster realm when you create a cluster. The realm name is derived from the default domain name in us-east-1 for the cluster.

Open the Windows command prompt with administrator privileges and type the following commands to create the trust relationship on the AD domain controller:

Copy
C:\Users\Administrator> ksetup /addkdc EC2.INTERNAL C:\Users\Administrator> netdom trust EC2.INTERNAL /Domain:AD.DOMAIN.COM /add /realm /passwordt:MyVeryStrongPassword C:\Users\Administrator> ksetup /SetEncTypeAttr EC2.INTERNAL AES256-CTS-HMAC-SHA1-96

Step 5: Use a DHCP Option Set to Specify the AD Domain Controller as a VPC DNS Server

Now that the AD domain controller is configured, you can configure the VPC to use it as a domain name server for name resolution within your VPC. To do this, attach a DHCP options set. Specify the Domain name as the domain name of your cluster—for example, ec2.internal if your cluster is in us-east-1 or region.compute.amazon.aws for other regions. For Domain name servers, specify your AD domain controller's private IP address as well as AmazonProvidedDNS. For more information, see Changing DHCP Option Sets.

Step 6: Launch a Kerberized EMR Cluster

  1. In Amazon EMR, create a security configuration that specifies the AD domain controller you created in the previous steps. An example command is shown below. Replace the domain, ad.domain.com, with the name of the domain you specified in Step 2: Launch and Install the AD Domain Controller.

    Copy
    aws emr create-security-configuration --name MyKerberosConfig \ --security-configuration '{ "AuthenticationConfiguration": { "KerberosConfiguration": { "Provider": "ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration": { "TicketLifetimeInHours": 24, "CrossRealmTrustConfiguration": { "Realm": "AD.DOMAIN.COM", "Domain": "ad.domain.com", "AdminServer": "ad.domain.com", "KdcServer": "ad.domain.com" } } } } }'
  2. Create the cluster, specifying the security configuration (in this example, MyKerberosConfig) and the same subnet you created in Step 1: Set Up the VPC and Subnet.

    Also specify the following cluster-specific kerberos-attributes:

    The following example launches a kerberized cluster.

    Copy
    aws emr create-cluster --name "MyKerberosCluster" \ --release-label emr-5.10.0 \ --instance-type m3.xlarge \ --instance-count 3 \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair \ --service-role EMR_DefaultRole \ --security-configuration MyKerberosConfig\ --applications Name=Hadoop Name=Hive Name=Oozie Name=Hue Name=HCatalog Name=Spark \ --kerberos-attributes Realm=EC2.INTERNAL,\ KdcAdminPassword=MyClusterKDCAdminPwd,\ ADDomainJoinUser=ADUserLogonName,ADDomainJoinPassword=ADUserPassword,\ CrossRealmTrustPrincipalPassword=MatchADTrustPwd

Step 7: Create HDFS Users and Set Permissions on the Cluster for AD User Accounts

When setting up a trust relationship with AD, Amazon EMR creates Linux users on the cluster for each AD user account. For example, the user logon name LiJuan in AD has a Linux user account of lijuan. AD is case-sensitive, while Linux does not honor AD casing.

To allow your users to login to the cluster to run Hadoop jobs, you must add HDFS user directories for their Linux user accounts, and grant each user ownership of their directory. To do this, we recommend that you run a script saved to Amazon S3 as a cluster step. Alternatively, you can run the commands in the script below from the command line on the master node. Use the EC2 key pair that you specified when you created the cluster to connect to the master node over SSH as the Hadoop user. For more information, see Use an Amazon EC2 Key Pair for SSH Credentials.

Run the following command to add a step to the cluster that runs a script, AddHDFSUsers.sh.

Copy
aws emr add-steps --cluster-id ClusterID \ --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\ Jar=s3://MyRegion.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://MyBucketPath/AddHDFSUsers.sh"]

The contents of the file AddHDFSUsers.sh is as follows.

Copy
#!/bin/bash # AddHDFSUsers.sh script # Initialize an array of user names from AD ADUSERS=("lijuan" "marymajor" "richardroe" "myusername") # For each user listed, create an HDFS user directory # and change ownership to the user for username in ${ADUSERS[@]}; do hdfs dfs -mkdir /user/$username hdfs dfs -chown $username:$username /user/$username done

(Optional) Configure AD Group Mappings

Additional configuration is required to map AD groups to Hadoop, which is done through customizing core-site configuration classification properties. For more information, see Configuring Applications in the Amazon EMR Release Guide. An example configuration classification is shown below. For more information about the configuration properties, see Hadoop Groups Mapping in the Apache Hadoop documentation.

Copy
[ { "Classification": "core-site", "Properties": { "hadoop.security.group.mapping": "org.apache.hadoop.security.CompositeGroupsMapping", "hadoop.security.group.mapping.providers": "ad4users,shell4services", "hadoop.security.group.mapping.provider.ad4users": "org.apache.hadoop.security.LdapGroupsMapping", "hadoop.security.group.mapping.provider.ad4users.ldap.url": "ldap://ad.test.com", "hadoop.security.group.mapping.provider.ad4users.ldap.bind.user": “Administrator@ad.test.com", "hadoop.security.group.mapping.provider.ad4users.ldap.bind.password": "Abc123456", "hadoop.security.group.mapping.provider.ad4users.ldap.base": “dc=ad,dc=test,dc=com", "hadoop.security.group.mapping.provider.ad4users.ldap.search.filter.user": "(&(objectClass=user)(sAMAccountName={0}))", "hadoop.security.group.mapping.provider.ad4users.ldap.search.filter.group": "(objectClass=group)", "hadoop.security.group.mapping.provider.ad4users.ldap.search.attr.member": "member", "hadoop.security.group.mapping.provider.ad4users.ldap.search.attr.group.name": "cn", "hadoop.security.group.mapping.provider.shell4services": "org.apache.hadoop.security.ShellBasedUnixGroupsMapping", } } ]

For the hadoop.security.group.mapping.provider.ad4users.ldap.bind.password property, you can use a password file to avoid exposing clear text in the general configuration setting parameters. To do this, specify the hadoop.security.group.mapping.provider.ad4users.ldap.bind.password.file property.

If an AD user belongs to multiple groups, this example configuration maps all groups into Hadoop. You can adjust the ldap.search configuration properties to narrow this down. We recommend starting with only one group to user to reduce complexity. Linux or HDFS only take one group per file or directory, although it is possible to enable multi-group access with ACLs on Hadoop.

Step 8: Use SSH to log in to the cluster

Users in the AD domain should now be able to log on to the cluster with their domain credentials. Linux users can connect using ssh as shown in the example below, replacing myusername with the user logon name from AD and replacing ec2-xx-xxx-xx-xx.compute-1.amazonaws.com with the Master public DNS value listed on the cluster's Summary page.

myusername@ec2-xx-xxx-xx-xx.compute-1.amazonaws.com

Your Linux computer most likely includes an SSH client by default. For example, OpenSSH is installed on most Linux, Unix, and Mac OS X operating systems. You can check for an SSH client by typing ssh at the command line. If your computer does not recognize the command, install an SSH client to connect to the master node. The OpenSSH project provides a free implementation of the full suite of SSH tools. For more information, see the OpenSSH website.

Similarly, Windows users can use PuTTY, specifying myusername@ec2-xx-xxx-xx-xx.compute-1.amazonaws.com for the Host Name. Make sure that the default Attempt GSSAPI Authentication is still enabled under Connection, SSH, Auth, GSSAPI.

For more information about SSH connections, see Connect to the Cluster.