Use Amazon EMR clusters from Studio Classic notebooks - Amazon SageMaker

Use Amazon EMR clusters from Studio Classic notebooks

In this section, you learn about how to discover, connect to, or terminate an Amazon EMR cluster from SageMaker Studio Classic notebooks.

When connecting to your Amazon EMR cluster from SageMaker Studio Classic, you can authenticate to your cluster with Kerberos, Lightweight Directory Access Protocol (LDAP), or use runtime IAM role authentication. Your authentication method depends on your cluster configuration. You can refer to this example Access Apache Livy using a Network Load Balancer on a Kerberos-enabled Amazon EMR cluster to set up an Amazon EMR cluster that uses Kerberos. Alternatively, you can look at the CloudFormation example templates using Kerberos or LDAP in the aws-samples/sagemaker-studio-emr GitHub repository.

Find the list of available connection commands to an Amazon EMR cluster per authentication method in Enter the connection command to an Amazon EMR cluster manually to connect to your Amazon EMR cluster.

Supported images and kernels to connect to an Amazon EMR cluster from SageMaker Studio Classic

SageMaker Studio Classic provides built-in support to connect to Amazon EMR clusters in the following images and kernels:

  • DataScience – Python 3 kernel

  • DataScience 2.0 – Python 3 kernel

  • DataScience 3.0 – Python 3 kernel

  • SparkAnalytics 1.0 – SparkMagic and PySpark kernels

  • SparkAnalytics 2.0 – SparkMagic and PySpark kernels

  • SparkMagic – SparkMagic and PySpark kernels

  • PyTorch 1.8 – Python 3 kernels

  • TensorFlow 2.6 – Python 3 kernel

  • TensorFlow 2.11 – Python 3 kernel

Those images and kernels come with sagemaker-studio-analytics-extension, a notebook extension that enables connection to a remote Spark (Amazon EMR) cluster via the SparkMagic library using Apache Livy.

To connect to Amazon EMR clusters using another built-in image or your own image, follow the instructions in Bring your own image.

Bring your own image

To bring your own image in SageMaker Studio Classic and allow your notebooks to connect to Amazon EMR clusters, install the following sagemaker-studio-analytics-extension extension to your kernel. It supports connecting SageMaker Studio Classic notebooks to Spark(Amazon EMR) clusters through the SparkMagic library.

pip install sparkmagic pip install sagemaker-studio-sparkmagic-lib pip install sagemaker-studio-analytics-extension

Additionally, to connect to Amazon EMR with Kerberos authentication, you must install the kinit client. Depending on your OS, the command to install the kinit client can vary. To bring an Ubuntu (Debian based) image, use the apt-get install -y -qq krb5-user command.

For more information on bringing your own image in SageMaker Studio Classic, see Bring your own SageMaker image.