What's New? - Amazon EMR

What's New?

This topic covers features and issues resolved in the current release of Amazon EMR 6.x series and 5.x series. These release notes are also available on the Release 6.0.0 Tab and Release 5.30.0 Tab, along with the application versions, component versions, and available configuration classifications for this release.

Subscribe to the RSS feed for Amazon EMR release notes at https://docs.aws.amazon.com/emr/latest/ReleaseGuide/amazon-emr-release-notes.rss to receive updates when a new Amazon EMR release version is available.

For earlier release notes going back to release version 4.2.0, see Amazon EMR What's New History.

Note

Twenty-five previous Amazon EMR release versions now use AWS Signature Version 4 to authenticate requests to Amazon S3. The use of AWS Signature version 2 is being phased out and new S3 buckets created after June 24, 2020 will not support Signature Version 2 signed requests. Existing buckets will continue to support Signature Version 2. We recommend migrating to an Amazon EMR release that supports Signature Version 4 so you can continue accessing new S3 buckets and avoid any potential interruption to your workloads.

The following EMR releases are now available that supports Signature Version 4: emr-4.7.4, emr-4.8.5, emr-4.9.6, emr-4.10.1, emr-5.1.1, emr-5.2.3, emr-5.3.2, emr-5.4.1, emr-5.5.4, emr-5.6.1, emr-5.7.1, emr-5.8.3, emr-5.9.1, emr-5.10.1, emr-5.11.4, emr-5.12.3, emr-5.13.1, emr-5.14.2, emr-5.15.1, emr-5.16.1, emr-5.17.2, emr-5.18.1, emr-5.19.1, emr-5.20.1, and emr-5.21.2. EMR version 5.22.0 and later already support Signature Version 4.

You do not need to change your application code to use Signature Version 4 if you are using Amazon EMR applications, such as Apache Spark, Apache Hive, Presto, etc. If you are using custom applications, which are not included with Amazon EMR, you may need to update your code to use Signature Version 4. For more information about what updates may be required, see Moving from Signature Version 2 to Signature Version 4.

Release 6.0.0 (Latest version of Amazon EMR 6.x series)

New Amazon EMR release versions are made available in different regions over a period of several days, beginning with the first region on the initial release date. The latest release version may not be available in your region during this period.

The following release notes include information for Amazon EMR release version 6.0.0.

Initial release date: March 10, 2020

Supported Applications

  • AWS SDK for Java version 1.11.711

  • Ganglia version 3.7.2

  • Hadoop version 3.2.1

  • HBase version 2.2.3

  • HCatalog version 3.1.2

  • Hive version 3.1.2

  • Hudi version 0.5.0-incubating

  • Hue version 4.4.0

  • JupyterHub version 1.0.0

  • Livy version 0.6.0

  • MXNet version 1.5.1

  • Oozie version 5.1.0

  • Phoenix version 5.0.0

  • Presto version 0.230

  • Spark version 2.4.4

  • TensorFlow version 1.14.0

  • Zeppelin version 0.9.0-SNAPSHOT

  • Zookeeper version 3.4.14

  • Connectors and drivers: DynamoDB Connector 4.14.0

Note

Flink, Sqoop, Pig, and Mahout are not available in Amazon EMR version 6.0.0.

New Features

  • YARN Docker Runtime Support - YARN applications, such as Spark jobs, can now run in the context of a Docker container. This allows you to easily define dependencies in a Docker image without the need to install custom libraries on your Amazon EMR cluster. For more information, see Configure Docker Integration and Run Spark applications with Docker using Amazon EMR 6.0.0.

  • Hive LLAP Support - Hive now supports the LLAP execution mode for improved query performance. For more information, see Using Hive LLAP.

Changes, Enhancements, and Resolved Issues

  • Amazon Linux

    • Amazon Linux 2 is the operating system for the EMR 6.x release series.

    • systemd is used for service management instead of upstart used in Amazon Linux 1.

  • Java Development Kit (JDK)

    • Coretto JDK 8 is the default JDK for the EMR 6.x release series.

  • Scala

    • Scala 2.12 is used with Apache Spark and Apache Livy.

  • Python 3

    • Python 3 is now the default version of Python in EMR.

  • YARN node labels

    • Beginning with Amazon EMR 6.x release series, the YARN node labels feature is disabled by default. The application master processes can run on both core and task nodes by default. You can enable the YARN node labels feature by configuring following properties: yarn.node-labels.enabled and yarn.node-labels.am.default-node-label-expression. For more information, see Understanding Master, Core, and Task Nodes.

Known Issues

  • Spark interactive shell, including PySpark, SparkR, and spark-shell, does not support using Docker with additional libraries.

Release 5.30.0 (Latest version of Amazon EMR 5.x series)

New Amazon EMR release versions are made available in different regions over a period of several days, beginning with the first region on the initial release date. The latest release version may not be available in your region during this period.

The following release notes include information for Amazon EMR release version 5.30.0. Changes are relative to 5.29.0.

Initial release date: May 13, 2020

Upgrades

  • Upgraded AWS SDK for Java to version 1.11.759

  • Upgraded Amazon SageMaker Spark SDK to version 1.3.0

  • Upgraded EMR Record Server to version 1.6.0

  • Upgraded Flink to version 1.10.0

  • Upgraded Ganglia to version 3.7.2

  • Upgraded HBase to version 1.4.13

  • Upgraded Hudi to version 0.5.2-incubating

  • Upgraded Hue to version 4.6.0

  • Upgraded JupyterHub to version 1.1.0

  • Upgraded Livy to version 0.7.0-incubating

  • Upgraded Oozie to version 5.2.0

  • Upgraded Presto to version 0.232

  • Upgraded Spark to version 2.4.5

  • Upgraded Connectors and drivers: Amazon Glue Connector 1.12.0; Amazon Kinesis Connector 3.5.0; EMR DynamoDB Connector 4.14.0

New Features

  • Managed Scaling – With Amazon EMR version 5.30.0 and later, you can enable EMR managed scaling to automatically increase or decrease the number of instances or units in your cluster based on workload. EMR continuously evaluates cluster metrics to make scaling decisions that optimize your clusters for cost and speed. For more information, see Scaling Cluster Resources in the Amazon EMR Management Guide.

  • Amazon Linux 2 support – In EMR version 5.30.0 and later, EMR uses Amazon Linux 2 OS. New custom AMIs (Amazon Machine Image) must be based on the Amazon Linux 2 AMI. For more information, see Using a Custom AMI.

  • Presto Graceful Auto Scale – EMR clusters using 5.30.0 can be set with an auto scaling timeout period that gives Presto tasks time to finish running before their node is decommissioned. For more information, see Using Presto Auto Scaling with Graceful Decommission.

Changes, Enhancements, and Resolved Issues

  • EMR version 5.30.0 doesn't install Ganglia by default. You can explicitly select Ganglia to install when you create a cluster.

  • Spark performance optimizations.

  • Presto performance optimizations.

  • The default managed security group for service access in private subnets has been updated with new rules. If you use custom security group for service access, you must include the same rules as the default managed security group. For more information, see Amazon EMR-Managed Security Group for Service Access (Private Subnets). If you use a custom service role for Amazon EMR, you must grant permission to ec2:describeSecurityGroups so that EMR can validate if the security groups are correctly created. If you use the EMR_DefaultRole, this permission is already included in the default managed policy.