Amazon EMR Studio - Amazon EMR

Amazon EMR Studio

Amazon EMR Studio is a web-based integrated development environment (IDE) for fully managed Jupyter notebooks that run on Amazon EMR clusters. You can set up an EMR Studio for your team to develop, visualize, and debug applications written in R, Python, Scala, and PySpark. EMR Studio is integrated with AWS Identity and Access Management (IAM) and IAM Identity Center so users can log in using their corporate credentials.

You can create an EMR Studio at no cost. Applicable charges for Amazon S3 storage and for Amazon EMR clusters apply when you use EMR Studio. For product details and highlights, see the service page for Amazon EMR Studio.

Key features of EMR Studio

Amazon EMR Studio provides the following features:

  • Authenticate users with AWS Identity and Access Management (IAM), or with AWS IAM Identity Center with or without trusted identity propagation and your enterprise identity provider.

  • Access and launch Amazon EMR clusters on-demand to run Jupyter Notebook jobs.

  • Connect to Amazon EMR on EKS clusters to submit work as job runs.

  • Explore and save example notebooks. For more information about example notebooks, see the EMR Studio Notebook examples GitHub repository.

  • Analyze data using Python, PySpark, Spark Scala, Spark R, or SparkSQL, and install custom kernels and libraries.

  • Collaborate in real time with other users in the same Workspace. For more information, see Configure Workspace collaboration.

  • Use the EMR Studio SQL Explorer to browse your data catalog, run SQL queries, and download results before you work with the data in a notebook.

  • Run parameterized notebooks as part of scheduled workflows with an orchestration tool such as Apache Airflow or Amazon Managed Workflows for Apache Airflow. For more information, see Orchestrating analytics jobs on EMR Notebooks using MWAA in the AWS Big Data Blog.

  • Link code repositories such as GitHub and BitBucket.

  • Track and debug jobs using the Spark History Server, Tez UI, or YARN timeline server.

EMR Studio is also HIPAA eligible and is certified under HITRUST CSF and SOC 2. For more information about HIPAA compliance for AWS services, see https://aws.amazon.com/compliance/hipaa-compliance/. To learn more about HITRUST CSF compliance for AWS services, see https://aws.amazon.com/compliance/hitrust/. For more information about other compliance programs for AWS services, see AWS Services in Scope by Compliance Program.

Amazon EMR Studio feature history

This table lists updates to the Amazon EMR managed scaling capability.

Release date Capability
January 5, 2024

Added support for EMR Studio in AWS GovCloud (US-East) and AWS GovCloud (US-West).

November 26, 2023

Added support for trusted identity propagation for EMR Studio with IAM Identity Center authentication.

October 26, 2023

Added ability to create an EMR Serverless application with interactive capability.

February 28, 2023

Added AWS KMS customer-managed key support for application log storage for EMR Serverless applications.

February 23, 2023

Added one-click IAM role creation for EMR Serverless job submission. Added ECR lookup for when you select a custom image for EMR Serverless applications.

January 27, 2023

Headless execution notebooks can track the progress of each cell execution with %execute_notebook magic.

January 23, 2023

Persistent application have been optimized for faster launch times.