What's new? - Amazon EMR

What's new?

This topic covers features and issues resolved in the current release of Amazon EMR 6.x series and 5.x series. These release notes are also available on the Release 6.6.0 Tab and Release 5.35.0 Tab, along with the application versions, component versions, and available configuration classifications for this release.

Subscribe to the RSS feed for Amazon EMR release notes at https://docs.aws.amazon.com/emr/latest/ReleaseGuide/amazon-emr-release-notes.rss to receive updates when a new Amazon EMR release version is available.

For earlier release notes going back to release version 4.2.0, see Amazon EMR what's new history.

Note

Twenty-five previous Amazon EMR release versions now use AWS Signature Version 4 to authenticate requests to Amazon S3. The use of AWS Signature version 2 is being phased out and new S3 buckets created after June 24, 2020 will not support Signature Version 2 signed requests. Existing buckets will continue to support Signature Version 2. We recommend migrating to an Amazon EMR release that supports Signature Version 4 so you can continue accessing new S3 buckets and avoid any potential interruption to your workloads.

The following EMR releases are now available that supports Signature Version 4: emr-4.7.4, emr-4.8.5, emr-4.9.6, emr-4.10.1, emr-5.1.1, emr-5.2.3, emr-5.3.2, emr-5.4.1, emr-5.5.4, emr-5.6.1, emr-5.7.1, emr-5.8.3, emr-5.9.1, emr-5.10.1, emr-5.11.4, emr-5.12.3, emr-5.13.1, emr-5.14.2, emr-5.15.1, emr-5.16.1, emr-5.17.2, emr-5.18.1, emr-5.19.1, emr-5.20.1, and emr-5.21.2. EMR version 5.22.0 and later already support Signature Version 4.

You do not need to change your application code to use Signature Version 4 if you are using Amazon EMR applications, such as Apache Spark, Apache Hive, Presto, etc. If you are using custom applications, which are not included with Amazon EMR, you may need to update your code to use Signature Version 4. For more information about what updates may be required, see Moving from Signature Version 2 to Signature Version 4.

Release 6.6.0 (latest version of Amazon EMR 6.x series)

New Amazon EMR release versions are made available in different Regions over a period of several days, beginning with the first Region on the initial release date. The latest release version may not be available in your Region during this period.

The following release notes include information for Amazon EMR release version 6.6.0. Changes are relative to 6.5.0.

Initial release date: May 9, 2022

New Features

  • Amazon EMR 6.6 now supports Apache Spark 3.2, Apache Spark RAPIDS 22.02, CUDA 11, Apache Hudi 0.10.1, Apache Iceberg 0.13, Trino 0.367 and PrestoDB 0.267.

  • With Amazon EMR release 6.6 and later, when you launch new Amazon EMR clusters with the default Amazon Linux (AL) AMI option, Amazon EMR automatically uses the latest Amazon Linux AMI. In earlier versions, Amazon EMR does not update the Amazon Linux AMIs after the initial release. See Using the default Amazon Linux AMI for Amazon EMR.

  • With Amazon EMR 6.6 and later, applications that use Log4j 1.x and Log4j 2.x are upgraded to use Log4j 1.2.17 (or higher) and Log4j 2.17.1 (or higher) respectively, and do not require using the bootstrap actions provided to mitigate the CVE issues.

  • [Managed scaling] Spark shuffle data managed scaling optimization - For Amazon EMR versions 5.34.0 and later, and EMR versions 6.4.0 and later, managed scaling is now Spark shuffle data aware (data that Spark redistributes across partitions to perform specific operations). For more information on shuffle operations, see Using EMR managed scaling in Amazon EMR in the Amazon EMR Management Guide and Spark Programming Guide.

  • Starting with Amazon EMR 5.32.0 and 6.5.0, dynamic executor sizing for Apache Spark is enabled by default. To turn this feature on or off, you can use the spark.yarn.heterogeneousExecutors.enabled configuration parameter.

Changes, Enhancements, and Resolved Issues

  • Amazon EMR reduces cluster startup time by up to 80 seconds on average for clusters that use the EMR default AMI option and only install common applications, such as Apache Hadoop, Apache Spark and Apache Hive.

Release 5.35.0 (latest version of Amazon EMR 5.x series)

New Amazon EMR release versions are made available in different Regions over a period of several days, beginning with the first Region on the initial release date. The latest release version may not be available in your Region during this period.

This is the Amazon EMR release version 5.35.0 release note.

The following release notes include information for Amazon EMR release version 5.35.0. Changes are relative to 5.34.0.

Initial release date: March 30, 2022

New Features

  • Amazon EMR release 5.35 applications that use Log4j 1.x and Log4j 2.x are upgraded to use Log4j 1.2.17 (or higher) and Log4j 2.17.1 (or higher) respectively, and do not require using bootstrap actions to mitigate the CVE issues in previous releases. See Approach to mitigate CVE-2021-44228.

Changes, Enhancements, and Resolved Issues

Flink changes
Change type Description
Upgrades
  • Update flink version to 1.14.2.

  • log4j upgraded to 2.17.1.

Hadoop changes
Change type Description
Hadoop open source backports since EMR 5.34.0
  • YARN-10438: Handle null containerId in ClientRMService#getContainerReport()

  • YARN-7266: Timeline Server event handler threads locked

  • YARN-10438: ATS 1.5 fails to start if RollingLevelDb files are corrupt or missing

  • HADOOP-13500: Synchronizing iteration of Configuration properties object

  • YARN-10651: CapacityScheduler crashed with NPE in AbstractYarnScheduler.updateNodeResource()

  • HDFS-12221: Replace xerces in XmlEditsVisitor

  • HDFS-16410: Insecure Xml parsing in OfflineEditsXmlLoader

Hadoop changes and fixes
  • Tomcat used in KMS and HttpFS is upgraded to 8.5.75

  • In FileSystemOptimizedCommitterV2, the success marker was written in the commitJob output path defined while creating the committer. Since commitJob and task level output paths can differ, the path has been corrected to use the one defined in manifest files. For Hive jobs, this results in the success marker being written correctly in when performing operations such as dynamic partition or UNION ALL.

Hive changes
Change type Description
Hive upgraded to open source release 2.3.9, including these JIRA fixes
  • HIVE-17155: findConfFile() in HiveConf.java has some issues with the conf path

  • HIVE-24797: Disable validate default values when parsing Avro schemas

  • HIVE-21563: Improve Table#getEmptyTable performance by disable registerAllFunctionsOnce

  • HIVE-18147: Tests can fail with java.net.BindException: Address already in use

  • HIVE-24608: Switch back to get_table in HMS client for Hive 2.3.x

  • HIVE-21200: Vectorization - date column throwing java.lang.UnsupportedOperationException for parquet

  • HIVE-19228: Remove commons-httpclient 3.x usage

Hive open source backports since EMR 5.34.0
  • HIVE-19990: Query with interval literal in join condition fails

  • HIVE-25824: Upgrade branch-2.3 to log4j 2.17.0

  • TEZ-4062: Speculative attempt scheduling should be aborted when Task has completed

  • TEZ-4108: NullPointerException during speculative execution race condition

  • TEZ-3918: Setting tez.task.log.level does not work

Hive upgrades and fixes
  • Upgrade Log4j version to 2.17.1

  • Upgrade ORC version to 1.4.3

  • FixED deadlock due to penalty thread in ShuffleScheduler

New features
  • Added feature to print Hive Query in AM logs. This is disabled by default. Flag/Conf: tez.am.emr.print.hive.query.in.log. Status (default): FALSE.

Oozie changes
Change type Description
Oozie open source backports since EMR 5.34.0
  • OOZIE-3652: Oozie launcher should retry directory listing when NoSuchFileException occurs

Pig changes
Change type Description
Upgrades
  • log4j upgraded to 1.2.17.