What's new? - Amazon EMR

What's new?

This page describes the changes and functionality available in the latest releases of Amazon EMR 6.x and Amazon EMR 5.x. These release notes are also available on the Amazon EMR release 6.10.0 page and Amazon EMR release 5.36.1 page, along with the application versions, component versions, and available configuration classifications for each release.

Subscribe to the RSS feed for Amazon EMR release notes at https://docs.aws.amazon.com/emr/latest/ReleaseGuide/amazon-emr-release-notes.rss to receive updates when a new Amazon EMR release is available.

For release notes from prior releases, see the Amazon EMR archive of release notes.

Note

Amazon EMR releases now use AWS Signature Version 4 (SigV4) to authenticate requests to Amazon S3. We recommend that you use an Amazon EMR release that supports SigV4 so that you can access new S3 buckets and avoid interruption to your workloads. For more information and a list of Amazon EMR releases that support SigV4, see Amazon EMR and AWS Signature Version 4.

Amazon EMR 6.10.0 (latest release of 6.x series)

New Amazon EMR releases are made available in different Regions over a period of several days, beginning with the first Region on the initial release date. The latest release version may not be available in your Region during this period.

The following release notes include information for Amazon EMR release 6.10.0. Changes are relative to 6.9.0. For information on the release timeline, see the change log.

New features
  • Amazon EMR 6.10.0 supports Apache Spark 3.3.1, Apache Spark RAPIDS 22.12.0, CUDA 11.8.0, Apache Hudi 0.12.2-amzn-0, Apache Iceberg 1.1.0-amzn-0, Trino 403, and PrestoDB 0.278.1.

  • Amazon EMR 6.10.0 includes a native Trino-Hudi connector that provides read access to data in Hudi tables. You can activate the connector with trino-cli --catalog hudi, and configure the connector for your requirements with trino-connector-hudi. The native integration with Amazon EMR means that you no longer need to use trino-connector-hive to query Hudi tables. For a list of supported configurations with the new connector, see the Hudi connector page of the Trino documentation.

Changes, enhancements, and resolved issues
  • Amazon EMR 6.10.0 removes the dependency on minimal-json.jar for the Amazon Redshift integration for Apache Spark, and automatically adds the required Spark-Redshift related jars to the executor class path for Spark: spark-redshift.jar, spark-avro.jar, and RedshiftJDBC.jar.

  • The 6.10.0 release improves the on-cluster log management daemon to monitor additional log folders in your EMR cluster. This improvement minimizes disk over-utilization scenarios.

  • The 6.10.0 release automatically restarts the on-cluster log management daemon when it stops. This improvement reduces the risk for nodes to appear unhealthy due to disk over-utilization.

  • Amazon EMR 6.10.0 supports regional endpoints for EMRFS user mapping.

  • The default root volume size has increased to 15 GB in Amazon EMR 6.10.0 and higher. Earlier releases have default root volume size of 10 GB.

  • The 6.10.0 release fixes an issue that caused Spark jobs to stall when all remaining Spark executors are on a decommissioning host with the YARN resource manager.

  • With Amazon EMR 6.6.0 through 6.9.0, INSERT queries with dynamic partition and an ORDER BY or SORT BY clause will always have two reducers. This issue is caused by OSS change HIVE-20703, which puts dynamic sort partition optimization under cost-based decision. If your workload doesn't require sorting of dynamic partitions, we recommend that you set the hive.optimize.sort.dynamic.partition.threshold property to -1 to disable the new feature and get the correctly calculated number of reducers. This issue is fixed in OSS Hive as part of HIVE-22269 and is fixed in Amazon EMR 6.10.0.

  • With Amazon EMR release 6.6.0 and later, when you launch new Amazon EMR clusters with the default Amazon Linux (AL) AMI option, Amazon EMR automatically uses the latest Amazon Linux AMI. In earlier releases, Amazon EMR does not update the Amazon Linux AMIs after the initial release. See Using the default Amazon Linux AMI for Amazon EMR.

    OsReleaseLabel (Amazon Linux version) Amazon Linux kernel version Available date Supported Regions
    2.0.20230418.0 4.14.311 May 3, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Europe (Stockholm), Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt), Europe (Milan), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Jakarta), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain), Middle East (UAE)
    2.0.20230404.1 4.14.311 April 18, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Europe (Stockholm), Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt), Europe (Milan), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Jakarta), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain), Middle East (UAE)
    2.0.20230404.0 4.14.311 April 10, 2023 US East (N. Virginia), Europe (Paris)
    2.0.20230320.0 4.14.309 March 30, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Europe (Stockholm), Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt), Europe (Milan), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Jakarta), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain), Middle East (UAE)
    2.0.20230207.0 4.14.304 February 22, 2023 US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Europe (Stockholm), Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt), Europe (Milan), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Jakarta), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain), Middle East (UAE)

Amazon EMR 5.36.1 (latest release of 5.x series)

New Amazon EMR releases are made available in different Regions over a period of several days, beginning with the first Region on the initial release date. The latest release version may not be available in your Region during this period.

The following release notes include information for Amazon EMR release 5.36.1. Changes are relative to 5.36.0. For information on the release timeline, see the change log.

Changes, enhancements, and resolved issues
  • Amazon EMR release 5.36.1 adds support for archiving logs to Amazon S3 during cluster scale-down. In previous 5.x releases, you could only archive log files to Amazon S3 during cluster termination. This improvement ensures that log files generated on the cluster persist on Amazon S3 even after the node is terminated. For more information, see Configure cluster logging and debugging.

  • The 5.36.1 release improves the on-cluster log management daemon to monitor additional log folders in your EMR cluster. This improvement minimizes disk over-utilization scenarios.

  • The 5.36.1 release automatically restarts the on-cluster log management daemon when it stops. This improvement reduces the risk for nodes to appear unhealthy due to disk over-utilization.

  • The 5.36.1 release fixes an issue where Amazon EMR daemons on the primary node would maintain stale metadata for terminated instances in the cluster. Maintaining stale data might cause on-cluster CPU and memory usage to grow without bounds, and ultimately cause cluster failures.

  • For clusters that are launched with multiple primary nodes, the 5.36.1 release fixes an issue where an Amazon EC2 hardware failure on one of the primary nodes could cause a second primary node to fail and render your cluster unstable.

  • For clusters that are configured with in-transit encryption, Managed Scaling is now Spark shuffle data aware. Spark shuffle data is data that Spark redistributes across partitions to perform specific operations. During scale down, Managed Scaling ignores the instances with shuffle data. This prevents job re-attempts and re-computations, which are costly for price and performance. For more information on shuffle operations, see the Spark Programming Guide.

  • Amazon EMR 5.36.1 supports automatic Amazon Linux updates for clusters using a default AMI. See Using the default Amazon Linux AMI for Amazon EMR.

    OsReleaseLabel (Amazon Linux Version) Amazon Linux Kernel Version Available Date Supported Regions
    2.0.20230404.1 4.14.311 April 18, 2023

    US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Stockholm), Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt), Europe (Milan), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Jakarta), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Singapore), Asia Pacific (Sydney), Africa (Cape Town), South America (São Paulo), Middle East (Bahrain), Canada (Central)

Amazon EMR and AWS Signature Version 4

Amazon EMR releases now use AWS Signature Version 4 (SigV4) to authenticate requests to Amazon S3. Buckets created in Amazon S3 after June 24, 2020 don't support requests signed by Signature Version 2 (SigV2). Buckets created on or before June 24, 2020 will continue to support SigV2. We recommend that you migrate to an Amazon EMR release that supports SigV4 so that you can access new S3 buckets and avoid interruption to your workloads.

If you use applications that are included with Amazon EMR such as Apache Spark, Apache Hive, and Presto, you don't need to change your application code to use SigV4 . If you use custom applications that are not included with Amazon EMR, you might need to update your code to use SigV4. For more information, see Moving from Signature Version 2 to Signature Version 4 in the Amazon S3 User Guide.

The following Amazon EMR releases support SigV4: emr-4.7.4, emr-4.8.5, emr-4.9.6, emr-4.10.1, emr-5.1.1, emr-5.2.3, emr-5.3.2, emr-5.4.1, emr-5.5.4, emr-5.6.1, emr-5.7.1, emr-5.8.3, emr-5.9.1, emr-5.10.1, emr-5.11.4, emr-5.12.3, emr-5.13.1, emr-5.14.2, emr-5.15.1, emr-5.16.1, emr-5.17.2, emr-5.18.1, emr-5.19.1, emr-5.20.1, and emr-5.21.2, and emr-5.22.0 and later.