Understanding Amazon EMR on EKS release versions - Amazon EMR

Understanding Amazon EMR on EKS release versions

An Amazon EMR release is a set of open-source applications from the big data ecosystem. Each release comprises different big data applications, components, and features that you select to have Amazon EMR on EKS deploy and configure when you run your job.

Beginning with Amazon EMR versions 5.32.0 and 6.2.0, you can deploy Amazon EMR on EKS. This deployment option is not available in earlier Amazon EMR release versions. You must specify the supported release version when you submit your job.

When you deploy using Amazon EMR on EKS, be aware of the following differences in release versions between Amazon EMR on EKS and Amazon EMR running on Amazon EC2.

  • Amazon EMR on EKS uses a different form of release label from Amazon EMR running on Amazon EC2: emr-x.x.x-latest. Using -latest ensures that your Amazon EMR version always includes the latest security updates.

  • In Amazon EMR release 5.32.0 and 6.2.0, Amazon EMR on EKS supports:

    • Only these applications: Spark, Jupyter Enterprise Gateway (endpoints, public preview).

    • Only these components: aws-hm-client (Glue connector), aws-sagemaker-spark-sdk, emr-s3-select, emrfs, emr-ddb, hudi-spark.

    • Configuration classifications in the following table. Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as spark-hive-site.xml. For more information, see Configuring Applications.

      Classifications Descriptions

      core-site

      Change values in Hadoop’s core-site.xml file.

      emrfs-site

      Change EMRFS settings.

      spark-metrics

      Change values in Spark's metrics.properties file.

      spark-defaults

      Change values in Spark's spark-defaults.conf file.

      spark-env

      Change values in the Spark environment.

      spark-hive-site

      Change values in Spark's hive-site.xml file.

      spark-log4j

      Change values in Spark's log4j.properties file.

    For the list of applications, components, and configuration classifications supported by Amazon EMR deployed on EC2, see About Amazon EMR Releases in the Amazon EMR Release Guide.

  • Amazon EMR on EKS supports a different set of features than Amazon EMR running on EC2. For a comparison between Amazon EMR on EKS and Amazon EMR running on EC2, see Amazon EMR FAQs.