What's new?
This page describes the changes and functionality available in the latest releases of Amazon EMR 6.x and Amazon EMR 5.x. These release notes are also available on the Amazon EMR release 6.15.0 page and Amazon EMR release 5.36.1 page, along with the application versions, component versions, and available configuration classifications for each release.
Subscribe to the RSS feed for Amazon EMR release notes at https://docs.aws.amazon.com/emr/latest/ReleaseGuide/amazon-emr-release-notes.rss to receive updates when a new Amazon EMR release is available.
For release notes from prior releases, see the Amazon EMR archive of release notes.
Note
Amazon EMR releases now use AWS Signature Version 4 (SigV4) to authenticate requests to Amazon S3. We recommend that you use an Amazon EMR release that supports SigV4 so that you can access new S3 buckets and avoid interruption to your workloads. For more information and a list of Amazon EMR releases that support SigV4, see Amazon EMR and AWS Signature Version 4.
Amazon EMR 6.15.0 (latest release of 6.x series)
New Amazon EMR releases are made available in different Regions over a period of several days, beginning with the first Region on the initial release date. The latest release version may not be available in your Region during this period.
The following release notes include information for Amazon EMR release 6.15.0. Changes are relative to 6.14.0. For information on the release timeline, see the 6.15.0 change log.
New features
-
Application upgrades – Amazon EMR 6.15.0 application upgrades include Apache Hadoop 3.3.6, Apache Hudi 0.14.0-amzn-0, Iceberg 1.4.0-amzn-0, and Trino 426.
-
CodeWhisperer for EMR Studio – You can now use Amazon CodeWhisperer with Amazon EMR Studio to get real-time recommendations as you write code in JupyterLab. CodeWhisperer can complete your comments, finish single lines of code, make line-by-line recommendations, and generate fully-formed functions.
-
Faster job restart times with Flink – With Amazon EMR 6.15.0 and higher, several new mechanisms are available for Apache Flink to improve the job restart time during task recovery or scaling operations. This optimizes the speed of recovery and restart of execution graphs to improve job stability.
-
Table-level and fine-grained access control for open-table formats – With Amazon EMR 6.15.0 and higher, when you run Spark jobs on Amazon EMR on EC2 clusters that access data in the AWS Glue Data Catalog, you can use AWS Lake Formation to apply table, row, column, and cell level permissions on Hudi, Iceberg, or Delta Lake based tables.
-
Hadoop upgrade – Amazon EMR 6.15.0 includes an upgrade of Apache Hadoop to version 3.3.6. Hadoop 3.3.6 was the latest version at the time of the Amazon EMR 6.15 deployment, released by Apache in June 2023. Prior releases of Amazon EMR (6.9.0 to 6.14.x) used Hadoop 3.3.3.
The upgrade includes hundreds of improvements and fixes, and features that include reconfigurable datanode parameters,
DFSAdmin
option to initiate bulk reconfiguration operations on all live datanodes, and a vectored API that allows seek-heavy readers to specify multiple ranges to read. Hadoop 3.3.6 also adds support for HDFS APIs and semantics for its write-ahead log (WAL), so that HBase can run on other storage system implementations. For more information, see the changelogs for versions 3.3.4, 3.3.5 , and 3.3.6 in the Apache Hadoop documentation. -
Support for AWS SDK for Java, version 2 - Amazon EMR 6.15.0 applications can use AWS SDK for Java versions 1.12.569
or 2.20.160 if the application supports v2. The AWS SDK for Java 2.x is a major rewrite of the version 1.x code base. It’s built on top of Java 8+ and adds several frequently requested features. These include support for non-blocking I/O, and the ability to plug in a different HTTP implementation at runtime. For more information, including a Migration Guide from SDK for Java v1 to v2, see the AWS SDK for Java, version 2 guide.
Changes, enhancements, and resolved issues
To improve your high-availability EMR clusters, this release enables connectivity to Amazon EMR daemons on local host that use IPv6 endpoints.
This release enables TLS 1.2 for communication with ZooKeeper provisioned on all the primary nodes of your high-availability cluster.
This release improves the management of ZooKeeper transaction log files that are maintained on primary nodes to minimize scenarios where the log files grow out of bounds and interrupt cluster operations.
This release makes intra-node communication more resilient for high-availability EMR clusters. This improvement reduces the chance of bootstrap action failures or cluster start failures.
-
Tez in Amazon EMR 6.15.0 introduces configurations that you can specify to asynchronously open the input splits in a Tez grouped split. This results in faster performance of read queries when there are a large number of input splits in a single Tez grouped split. For more information, see Tez asynchronous split opening.
When you launch a cluster with the latest patch release of Amazon EMR 5.36 or higher, or 6.6 or higher, Amazon EMR uses the latest Amazon Linux 2 release for the default Amazon EMR AMI. For more information, see Using the default Amazon Linux AMI for Amazon EMR.
OsReleaseLabel (Amazon Linux version) Amazon Linux kernel version Available date Supported Regions 2.0.20231101.0 4.14.327 November 13, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Stockholm), Europe (Milan), Europe (Spain), Europe (Frankfurt), Europe (Zurich), Europe (Ireland), Europe (London), Europe (Paris), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Hyderabad), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Jakarta), Asia Pacific (Melbourne), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain), Middle East (UAE), Canada (Central),Israel (Tel Aviv)
Amazon EMR 5.36.1 (latest release of 5.x series)
New Amazon EMR releases are made available in different Regions over a period of several days, beginning with the first Region on the initial release date. The latest release version may not be available in your Region during this period.
The following release notes include information for Amazon EMR release 5.36.1. Changes are relative to 5.36.0. For information on the release timeline, see the change log.
Changes, enhancements, and resolved issues
Amazon EMR release 5.36.1 adds support for archiving logs to Amazon S3 during cluster scale-down. In previous 5.x releases, you could only archive log files to Amazon S3 during cluster termination. This improvement ensures that log files generated on the cluster persist on Amazon S3 even after the node is terminated. For more information, see Configure cluster logging and debugging.
The 5.36.1 release improves the on-cluster log management daemon to monitor additional log folders in your EMR cluster. This improvement minimizes disk over-utilization scenarios.
The 5.36.1 release automatically restarts the on-cluster log management daemon when it stops. This improvement reduces the risk for nodes to appear unhealthy due to disk over-utilization.
The 5.36.1 release fixes an issue where Amazon EMR daemons on the primary node would maintain stale metadata for terminated instances in the cluster. Maintaining stale data might cause on-cluster CPU and memory usage to grow without bounds, and ultimately cause cluster failures.
For clusters that are launched with multiple primary nodes, the 5.36.1 release fixes an issue where an Amazon EC2 hardware failure on one of the primary nodes could cause a second primary node to fail and render your cluster unstable.
For clusters that are configured with in-transit encryption, Managed Scaling is now Spark shuffle data aware. Spark shuffle data is data that Spark redistributes across partitions to perform specific operations. During scale down, Managed Scaling ignores the instances with shuffle data. This prevents job re-attempts and re-computations, which are costly for price and performance. For more information on shuffle operations, see the Spark Programming Guide
. When you launch a cluster with the latest patch release of Amazon EMR 5.36 or higher, or 6.6 or higher, Amazon EMR uses the latest Amazon Linux 2 release for the default Amazon EMR AMI. For more information, see Using the default Amazon Linux AMI for Amazon EMR.
OsReleaseLabel (Amazon Linux Version) Amazon Linux Kernel Version Available Date Supported Regions 2.0.20230727.0 4.14.320 August 14, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Stockholm), Europe (Milan), Europe (Spain), Europe (Frankfurt), Europe (Zurich), Europe (Ireland), Europe (London), Europe (Paris), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Hyderabad), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Jakarta), Asia Pacific (Melbourne), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain), Middle East (UAE), Canada (Central), Israel (Tel Aviv) 2.0.20230719.0 4.14.320 August 2, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Stockholm), Europe (Milan), Europe (Spain), Europe (Frankfurt), Europe (Zurich), Europe (Ireland), Europe (London), Europe (Paris), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Hyderabad), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Jakarta), Asia Pacific (Melbourne), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain), Middle East (UAE), Canada (Central), Israel (Tel Aviv) 2.0.20230628.0 4.14.318 July 12, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Europe (Stockholm), Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt), Europe (Milan), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Jakarta), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain) 2.0.20230612.0 4.14.314 June 23, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Europe (Stockholm), Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt), Europe (Milan), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Jakarta), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain) 2.0.20230404.1 4.14.311 April 18, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Stockholm), Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt), Europe (Milan), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Jakarta), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain), Canada (Central)
Amazon EMR and AWS Signature Version 4
Amazon EMR releases now use AWS Signature Version 4 (SigV4) to authenticate requests to Amazon S3. Buckets created in Amazon S3 after June 24, 2020 don't support requests signed by Signature Version 2 (SigV2). Buckets created on or before June 24, 2020 will continue to support SigV2. We recommend that you migrate to an Amazon EMR release that supports SigV4 so that you can access new S3 buckets and avoid interruption to your workloads.
If you use applications that are included with Amazon EMR such as Apache Spark, Apache Hive, and Presto, you don't need to change your application code to use SigV4 . If you use custom applications that are not included with Amazon EMR, you might need to update your code to use SigV4. For more information, see Moving from Signature Version 2 to Signature Version 4 in the Amazon S3 User Guide.
The following Amazon EMR releases support SigV4: emr-4.7.4, emr-4.8.5, emr-4.9.6, emr-4.10.1, emr-5.1.1, emr-5.2.3, emr-5.3.2, emr-5.4.1, emr-5.5.4, emr-5.6.1, emr-5.7.1, emr-5.8.3, emr-5.9.1, emr-5.10.1, emr-5.11.4, emr-5.12.3, emr-5.13.1, emr-5.14.2, emr-5.15.1, emr-5.16.1, emr-5.17.2, emr-5.18.1, emr-5.19.1, emr-5.20.1, and emr-5.21.2, and emr-5.22.0 and later.