Amazon EMR 5.x Release Versions - Amazon EMR

Amazon EMR 5.x Release Versions

Each tab below lists application versions, release notes, component versions, and configuration classifications available in each Amazon EMR 5.x release version.

For a comprehensive diagram of application versions in every release, see Application Versions in Amazon EMR 5.x Releases (PNG).

When you launch a cluster, you can choose from multiple release versions of Amazon EMR. This allows you to test and use application versions that fit your compatibility requirements. You specify the release version using the release label. Release labels are in the form emr-x.x.x. For example, emr-5.31.0.

New Amazon EMR release versions are made available in different regions over a period of several days, beginning with the first region on the initial release date. The latest release version may not be available in your region during this period.

5.31.0

Release 5.31.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.31.0 Release Notes

The following release notes include information for Amazon EMR release version 5.31.0. Changes are relative to 5.30.1.

Initial release date: Oct 9, 2020

Last updated date: Oct 15, 2020

Upgrades

  • Upgraded Amazon Glue connector to version 1.13.0

  • Upgraded Amazon SageMaker Spark SDK to version 1.4.0

  • Upgraded Amazon Kinesis connector to version 3.5.9

  • Upgraded AWS Java SDK to version 1.11.852

  • Upgraded Bigtop-tomcat to version 8.5.56

  • Upgraded EMR FS to version 2.43.0

  • Upgraded EMR MetricsAndEventsApiGateway Client to version 1.4.0

  • Upgraded EMR S3 Dist CP to version 2.15.0

  • Upgraded EMR S3 Select to version 1.6.0

  • Upgraded Flink to version 1.11.0

  • Upgraded Hadoop to version 2.10.0

  • Upgraded Hive to version 2.3.7

  • Upgraded Hudi to version 0.6.0

  • Upgraded Hue to version 4.7.1

  • Upgraded JupyterHub to version 1.1.0

  • Upgraded Mxnet to version 1.6.0

  • Upgraded OpenCV to version 4.3.0

  • Upgraded Presto to version 0.238.3

  • Upgraded TensorFlow to version 2.1.0

Changes, Enhancements, and Resolved Issues

New Features

  • With Amazon EMR 5.31.0, you can launch a cluster that integrates with Lake Formation. This integration provides fine-grained, column-level data filtering to databases and tables in the AWS Glue Data Catalog. It also enables federated single sign-on to EMR Notebooks or Apache Zeppelin from an enterprise identity system. For more information, see Integrating Amazon EMR with AWS Lake Formation in the Amazon EMR Management Guide.

    Amazon EMR with Lake Formation is currently available in 16 AWS Regions: US East (Ohio and N. Virginia), US West (N. California and Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, and Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris, and Stockholm), South America (São Paulo).

Known Issues

  • Known issue in clusters with multiple master nodes and Kerberos authentication

    If you run clusters with multiple master nodes and Kerberos authentication in EMR releases 5.20.0 and later, you may encounter problems with cluster operations such as scale down or step submission, after the cluster has been running for some time. The time period depends on the Kerberos ticket validity period that you defined. The scale-down problem impacts both automatic scale-down and explicit scale down requests that you submitted. Additional cluster operations can also be impacted.

    Workaround:

    • SSH as hadoop user to the lead master node of the EMR cluster with multiple master nodes.

    • Run the following command to renew Kerberos ticket for hadoop user.

      kinit -kt <keytab_file> <principal>

      Typically, the keytab file is located at /etc/hadoop.keytab and the principal is in the form of hadoop/<hostname>@<REALM>.

    Note

    This workaround will be effective for the time period the Kerberos ticket is valid. This duration is 10 hours by default, but can configured by your Kerberos settings. You must re-run the above command once the Kerberos ticket expires.

Release 5.31.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components in Amazon EMR differ from community versions. These components have a version label in the form CommunityVersion-amzn-EmrVersion. The EmrVersion starts at 0. For example, if open source community component named myapp-component with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-2.

Component Version Description
aws-sagemaker-spark-sdk 1.4.0 Amazon SageMaker Spark SDK
emr-ddb 4.15.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.13.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.5.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.15.0 Distributed copy application optimized for Amazon S3.
emr-s3-select 1.6.0 EMR S3Select Connector
emrfs 2.43.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.11.0 Apache Flink command line client scripts and applications.
flink-jobmanager-config 1.11.0 Managing resources on EMR nodes for Apache Flink JobManager.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.10.0-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.10.0-amzn-0 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.10.0-amzn-0 HDFS command-line client and library
hadoop-hdfs-namenode 2.10.0-amzn-0 HDFS service for tracking file names and block locations.
hadoop-hdfs-journalnode 2.10.0-amzn-0 HDFS service for managing the Hadoop filesystem journal on HA clusters.
hadoop-httpfs-server 2.10.0-amzn-0 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.10.0-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.10.0-amzn-0 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.10.0-amzn-0 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.10.0-amzn-0 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.10.0-amzn-0 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.13 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.13 Service for serving one or more HBase regions.
hbase-client 1.4.13 HBase command-line client.
hbase-rest-server 1.4.13 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.13 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.7-amzn-1 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.7-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.7-amzn-1 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.7-amzn-1 Hive command line client.
hive-hbase 2.3.7-amzn-1 Hive-hbase client.
hive-metastore-server 2.3.7-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.7-amzn-1 Service for accepting Hive queries as web requests.
hudi 0.6.0-amzn-0 Incremental processing framework to power data pipline at low latency and high efficiency.
hudi-spark 0.6.0-amzn-0 Bundle library for running Spark with Hudi.
hudi-presto 0.6.0-amzn-0 Bundle library for running Presto with Hudi.
hue-server 4.7.1 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 1.1.0 Multi-user server for Jupyter notebooks
livy-server 0.7.0-incubating REST interface for interacting with Apache Spark
nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server
mahout-client 0.13.0 Library for machine learning.
mxnet 1.6.0 A flexible, scalable, and efficient library for deep learning.
mariadb-server 5.5.64 MySQL database server.
nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit
oozie-client 5.2.0 Oozie command-line client.
oozie-server 5.2.0 Service for accepting Oozie workflow requests.
opencv 4.3.0 Open Source Computer Vision Library.
phoenix-library 4.14.3-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.3-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.238.3-amzn-0 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.238.3-amzn-0 Service for executing pieces of a query.
presto-client 0.238.3-amzn-0 Presto command-line client which is installed on an HA cluster's stand-by masters where Presto server is not started.
pig-client 0.17.0 Pig command-line client.
r 3.4.3 The R Project for Statistical Computing
ranger-kms-server 1.2.0 Apache Ranger Key Management System
spark-client 2.4.6-amzn-0 Spark command-line clients.
spark-history-server 2.4.6-amzn-0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.4.6-amzn-0 In-memory execution engine for YARN.
spark-yarn-slave 2.4.6-amzn-0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tensorflow 2.1.0 TensorFlow open source software library for high performance numerical computation.
tez-on-yarn 0.9.2 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.8.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.14 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.14 ZooKeeper command line client.

Release 5.31.0 Configuration Classifications

Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.31.0 Classifications
Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

hudi-env

Change values in the Hudi environment.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-s3-conf

Configure Jupyter Notebook S3 persistence.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-memory

Change values in Presto's memory.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

presto-connector-tpcds

Change values in Presto's tpcds.properties file.

ranger-kms-dbks-site

Change values in dbks-site.xml file of Ranger KMS.

ranger-kms-site

Change values in ranger-kms-site.xml file of Ranger KMS.

ranger-kms-env

Change values in the Ranger KMS environment.

ranger-kms-log4j

Change values in kms-log4j.properties file of Ranger KMS.

ranger-kms-db-ca

Change values for CA file on S3 for MySQL SSL connection with Ranger KMS.

recordserver-env

Change values in the EMR RecordServer environment.

recordserver-conf

Change values in EMR RecordServer's erver.properties file.

recordserver-log4j

Change values in EMR RecordServer's log4j.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.30.x

There are multiple releases within the 5.30 series. Choose a link below to see information for a specific release within this tab.

5.30.1 (Latest) | 5.30.0

Amazon EMR Release 5.30.1

Release 5.30.1 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.30.1 Release Notes

The following release notes include information for Amazon EMR release version 5.30.1. Changes are relative to 5.30.0.

Initial release date: June 30, 2020

Last updated date: August 24, 2020

Changes, Enhancements, and Resolved Issues

  • Fixed issue where instance controller process spawned infinite number of processes.

  • Fixed issue where Hue was unable to run an Hive query, showing a "database is locked" message and preventing the execution of queries.

  • Fixed a Spark issue to enable more tasks to run concurrently on the EMR cluster.

  • Fixed a Jupyter notebook issue causing a "too many files open error" in the Jupyter server.

  • Fixed an issue with cluster start times.

New Features

  • Tez UI and YARN timeline server persistent application interfaces are available with Amazon EMR versions 6.x, and EMR version 5.30.1 and later. One-click link access to persistent application history lets you quickly access job history without setting up a web proxy through an SSH connection. Logs for active and terminated clusters are available for 30 days after the application ends. For more information, see View Persistent Application User Interfaces in the Amazon EMR Management Guide.

  • EMR Notebook execution APIs are available to execute EMR notebooks via a script or command line. The ability to start, stop, list, and describe EMR notebook executions without the AWS console enables you programmatically control an EMR notebook. Using a parameterized notebook cell, you can pass different parameter values to a notebook without having to create a copy of the notebook for each new set of paramter values. See EMR API Actions. For sample code, see Sample commands to execute EMR Notebooks programmatically.

Known Issues

  • EMR Notebooks

    The feature that allows you to install kernels and additional Python libraries on the cluster master node is disabled by default on EMR version 5.30.1. For more information about this feature, see Installing Kernels and Python Libraries on a Cluster Master Node.

    To enable the feature, do the following:

    1. Make sure that the permissions policy attached to the service role for EMR Notebooks allows the following action:

      elasticmapreduce:ListSteps

      For more information, see Service Role for EMR Notebooks.

    2. Use the AWS CLI to run a step on the cluster that sets up EMR Notebooks as shown in the following example. For more information, see Adding Steps to a Cluster Using the AWS CLI.

      aws emr add-steps --cluster-id MyClusterID --steps Type=CUSTOM_JAR,Name=EMRNotebooksSetup,ActionOnFailure=CONTINUE,Jar=s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://awssupportdatasvcs.com/bootstrap-actions/EMRNotebooksSetup/emr-notebooks-setup.sh"]
  • Managed scaling

    Managed scaling operations on 5.30.0 and 5.30.1 clusters without Presto installed may cause application failures or cause a uniform instance group or instance fleet to stay in the ARRESTED state, particularly when a scale down operation is followed quickly by a scale up operation.

    As a workaround, choose Presto as an application to install when you create a cluster, even if your job does not require Presto.

  • Known issue in clusters with multiple master nodes and Kerberos authentication

    If you run clusters with multiple master nodes and Kerberos authentication in EMR releases 5.20.0 and later, you may encounter problems with cluster operations such as scale down or step submission, after the cluster has been running for some time. The time period depends on the Kerberos ticket validity period that you defined. The scale-down problem impacts both automatic scale-down and explicit scale down requests that you submitted. Additional cluster operations can also be impacted.

    Workaround:

    • SSH as hadoop user to the lead master node of the EMR cluster with multiple master nodes.

    • Run the following command to renew Kerberos ticket for hadoop user.

      kinit -kt <keytab_file> <principal>

      Typically, the keytab file is located at /etc/hadoop.keytab and the principal is in the form of hadoop/<hostname>@<REALM>.

    Note

    This workaround will be effective for the time period the Kerberos ticket is valid. This duration is 10 hours by default, but can configured by your Kerberos settings. You must re-run the above command once the Kerberos ticket expires.

Release 5.30.1 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components in Amazon EMR differ from community versions. These components have a version label in the form CommunityVersion-amzn-EmrVersion. The EmrVersion starts at 0. For example, if open source community component named myapp-component with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-2.

Component Version Description
aws-sagemaker-spark-sdk 1.3.0 Amazon SageMaker Spark SDK
emr-ddb 4.14.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.13.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.5.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.14.0 Distributed copy application optimized for Amazon S3.
emr-s3-select 1.5.0 EMR S3Select Connector
emrfs 2.40.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.10.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.5-amzn-6 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.5-amzn-6 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.5-amzn-6 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.5-amzn-6 HDFS service for tracking file names and block locations.
hadoop-hdfs-journalnode 2.8.5-amzn-6 HDFS service for managing the Hadoop filesystem journal on HA clusters.
hadoop-httpfs-server 2.8.5-amzn-6 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.5-amzn-6 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.5-amzn-6 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.5-amzn-6 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.5-amzn-6 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.5-amzn-6 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.13 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.13 Service for serving one or more HBase regions.
hbase-client 1.4.13 HBase command-line client.
hbase-rest-server 1.4.13 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.13 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.6-amzn-2 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.6-amzn-2 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.6-amzn-2 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.6-amzn-2 Hive command line client.
hive-hbase 2.3.6-amzn-2 Hive-hbase client.
hive-metastore-server 2.3.6-amzn-2 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.6-amzn-2 Service for accepting Hive queries as web requests.
hudi 0.5.2-incubating Incremental processing framework to power data pipline at low latency and high efficiency.
hudi-presto 0.5.2-incubating Bundle library for running Presto with Hudi.
hue-server 4.6.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 1.1.0 Multi-user server for Jupyter notebooks
livy-server 0.7.0-incubating REST interface for interacting with Apache Spark
nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server
mahout-client 0.13.0 Library for machine learning.
mxnet 1.5.1 A flexible, scalable, and efficient library for deep learning.
mariadb-server 5.5.64 MySQL database server.
nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit
oozie-client 5.2.0 Oozie command-line client.
oozie-server 5.2.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.14.3-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.3-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.232 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.232 Service for executing pieces of a query.
presto-client 0.232 Presto command-line client which is installed on an HA cluster's stand-by masters where Presto server is not started.
pig-client 0.17.0 Pig command-line client.
r 3.4.3 The R Project for Statistical Computing
ranger-kms-server 1.2.0 Apache Ranger Key Management System
spark-client 2.4.5-amzn-0 Spark command-line clients.
spark-history-server 2.4.5-amzn-0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.4.5-amzn-0 In-memory execution engine for YARN.
spark-yarn-slave 2.4.5-amzn-0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tensorflow 1.14.0 TensorFlow open source software library for high performance numerical computation.
tez-on-yarn 0.9.2 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.8.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.14 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.14 ZooKeeper command line client.

Release 5.30.1 Configuration Classifications

Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.30.1 Classifications
Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

hudi-env

Change values in the Hudi environment.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-s3-conf

Configure Jupyter Notebook S3 persistence.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-memory

Change values in Presto's memory.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

presto-connector-tpcds

Change values in Presto's tpcds.properties file.

ranger-kms-dbks-site

Change values in dbks-site.xml file of Ranger KMS.

ranger-kms-site

Change values in ranger-kms-site.xml file of Ranger KMS.

ranger-kms-env

Change values in the Ranger KMS environment.

ranger-kms-log4j

Change values in kms-log4j.properties file of Ranger KMS.

ranger-kms-db-ca

Change values for CA file on S3 for MySQL SSL connection with Ranger KMS.

recordserver-env

Change values in the EMR RecordServer environment.

recordserver-conf

Change values in EMR RecordServer's erver.properties file.

recordserver-log4j

Change values in EMR RecordServer's log4j.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

Amazon EMR Release 5.30.0

Release 5.30.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.30.0 Release Notes

The following release notes include information for Amazon EMR release version 5.30.0. Changes are relative to 5.29.0.

Initial release date: May 13, 2020

Last updated date: June 25, 2020

Upgrades

  • Upgraded AWS SDK for Java to version 1.11.759

  • Upgraded Amazon SageMaker Spark SDK to version 1.3.0

  • Upgraded EMR Record Server to version 1.6.0

  • Upgraded Flink to version 1.10.0

  • Upgraded Ganglia to version 3.7.2

  • Upgraded HBase to version 1.4.13

  • Upgraded Hudi to version 0.5.2-incubating

  • Upgraded Hue to version 4.6.0

  • Upgraded JupyterHub to version 1.1.0

  • Upgraded Livy to version 0.7.0-incubating

  • Upgraded Oozie to version 5.2.0

  • Upgraded Presto to version 0.232

  • Upgraded Spark to version 2.4.5

  • Upgraded Connectors and drivers: Amazon Glue Connector 1.12.0; Amazon Kinesis Connector 3.5.0; EMR DynamoDB Connector 4.14.0

New Features

  • EMR Notebooks – When used with EMR clusters created using 5.30.0, EMR notebook kernels run on cluster. This improves notebook performance and allows you to install and customize kernels. You can also install Python libraries on the cluster master node. For more information, see Installing and Using Kernels and Libraries in the EMR Management Guide.

  • Managed Scaling – With Amazon EMR version 5.30.0 and later, you can enable EMR managed scaling to automatically increase or decrease the number of instances or units in your cluster based on workload. EMR continuously evaluates cluster metrics to make scaling decisions that optimize your clusters for cost and speed. For more information, see Scaling Cluster Resources in the Amazon EMR Management Guide.

  • Encrypt log files stored in Amazon S3 – With Amazon EMR version 5.30.0 and later, you can encrypt log files stored in Amazon S3 with an AWS KMS customer managed key. For more information, see Encrypt log files stored in Amazon S3 in the Amazon EMR Management Guide.

  • Amazon Linux 2 support – In EMR version 5.30.0 and later, EMR uses Amazon Linux 2 OS. New custom AMIs (Amazon Machine Image) must be based on the Amazon Linux 2 AMI. For more information, see Using a Custom AMI.

  • Presto Graceful Auto Scale – EMR clusters using 5.30.0 can be set with an auto scaling timeout period that gives Presto tasks time to finish running before their node is decommissioned. For more information, see Using Presto Auto Scaling with Graceful Decommission.

  • Fleet Instance creation with new allocation strategy option – A new allocation strategy option is available in EMR version 5.12.1 and later. It offers faster cluster provisioning, more accurate spot allocation, and less spot instance interruption. Updates to non-default EMR service roles are required. See Configure Instance Fleets.

  • sudo systemctl stop and sudo systemctl start commands – In EMR version 5.30.0 and later, which use Amazon Linux 2 OS, EMR uses sudo systemctl stop and sudo systemctl start commands to restart services. For more information, see How do I restart a service in Amazon EMR?.

Changes, Enhancements, and Resolved Issues

  • EMR version 5.30.0 doesn't install Ganglia by default. You can explicitly select Ganglia to install when you create a cluster.

  • Spark performance optimizations.

  • Presto performance optimizations.

  • Python 3 is the default for Amazon EMR version 5.30.0 and later.

  • The default managed security group for service access in private subnets has been updated with new rules. If you use a custom security group for service access, you must include the same rules as the default managed security group. For more information, see Amazon EMR-Managed Security Group for Service Access (Private Subnets). If you use a custom service role for Amazon EMR, you must grant permission to ec2:describeSecurityGroups so that EMR can validate if the security groups are correctly created. If you use the EMR_DefaultRole, this permission is already included in the default managed policy.

Known Issues

  • Managed scaling

    Managed scaling operations on 5.30.0 and 5.30.1 clusters without Presto installed may cause application failures or cause a uniform instance group or instance fleet to stay in the ARRESTED state, particularly when a scale down operation is followed quickly by a scale up operation.

    As a workaround, choose Presto as an application to install when you create a cluster, even if your job does not require Presto.

  • Known issue in clusters with multiple master nodes and Kerberos authentication

    If you run clusters with multiple master nodes and Kerberos authentication in EMR releases 5.20.0 and later, you may encounter problems with cluster operations such as scale down or step submission, after the cluster has been running for some time. The time period depends on the Kerberos ticket validity period that you defined. The scale-down problem impacts both automatic scale-down and explicit scale down requests that you submitted. Additional cluster operations can also be impacted.

    Workaround:

    • SSH as hadoop user to the lead master node of the EMR cluster with multiple master nodes.

    • Run the following command to renew Kerberos ticket for hadoop user.

      kinit -kt <keytab_file> <principal>

      Typically, the keytab file is located at /etc/hadoop.keytab and the principal is in the form of hadoop/<hostname>@<REALM>.

    Note

    This workaround will be effective for the time period the Kerberos ticket is valid. This duration is 10 hours by default, but can configured by your Kerberos settings. You must re-run the above command once the Kerberos ticket expires.

Release 5.30.0 Component Versions

Component Version Description
aws-sagemaker-spark-sdk 1.3.0 Amazon SageMaker Spark SDK
emr-ddb 4.14.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.13.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.5.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-notebook-env 1.0.0 Conda env for emr notebook
emr-s3-dist-cp 2.14.0 Distributed copy application optimized for Amazon S3.
emr-s3-select 1.5.0 EMR S3Select Connector
emrfs 2.40.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.10.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.5-amzn-6 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.5-amzn-6 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.5-amzn-6 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.5-amzn-6 HDFS service for tracking file names and block locations.
hadoop-hdfs-journalnode 2.8.5-amzn-6 HDFS service for managing the Hadoop filesystem journal on HA clusters.
hadoop-httpfs-server 2.8.5-amzn-6 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.5-amzn-6 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.5-amzn-6 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.5-amzn-6 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.5-amzn-6 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.5-amzn-6 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.13 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.13 Service for serving one or more HBase regions.
hbase-client 1.4.13 HBase command-line client.
hbase-rest-server 1.4.13 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.13 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.6-amzn-2 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.6-amzn-2 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.6-amzn-2 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.6-amzn-2 Hive command line client.
hive-hbase 2.3.6-amzn-2 Hive-hbase client.
hive-metastore-server 2.3.6-amzn-2 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.6-amzn-2 Service for accepting Hive queries as web requests.
hudi 0.5.2-incubating Incremental processing framework to power data pipline at low latency and high efficiency.
hudi-presto 0.5.2-incubating Bundle library for running Presto with Hudi.
hue-server 4.6.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 1.1.0 Multi-user server for Jupyter notebooks
livy-server 0.7.0-incubating REST interface for interacting with Apache Spark
nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server
mahout-client 0.13.0 Library for machine learning.
mxnet 1.5.1 A flexible, scalable, and efficient library for deep learning.
mariadb-server 5.5.64 MySQL database server.
nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit
oozie-client 5.2.0 Oozie command-line client.
oozie-server 5.2.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.14.3-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.3-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.232 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.232 Service for executing pieces of a query.
presto-client 0.232 Presto command-line client which is installed on an HA cluster's stand-by masters where Presto server is not started.
pig-client 0.17.0 Pig command-line client.
r 3.4.3 The R Project for Statistical Computing
ranger-kms-server 1.2.0 Apache Ranger Key Management System
spark-client 2.4.5-amzn-0 Spark command-line clients.
spark-history-server 2.4.5-amzn-0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.4.5-amzn-0 In-memory execution engine for YARN.
spark-yarn-slave 2.4.5-amzn-0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tensorflow 1.14.0 TensorFlow open source software library for high performance numerical computation.
tez-on-yarn 0.9.2 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.8.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.14 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.14 ZooKeeper command line client.

Release 5.30.0 Configuration Classifications

Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.30.0 Classifications
Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

hudi-env

Change values in the Hudi environment.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-s3-conf

Configure Jupyter Notebook S3 persistence.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-memory

Change values in Presto's memory.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

presto-connector-tpcds

Change values in Presto's tpcds.properties file.

ranger-kms-dbks-site

Change values in dbks-site.xml file of Ranger KMS.

ranger-kms-site

Change values in ranger-kms-site.xml file of Ranger KMS.

ranger-kms-env

Change values in the Ranger KMS environment.

ranger-kms-log4j

Change values in kms-log4j.properties file of Ranger KMS.

ranger-kms-db-ca

Change values for CA file on S3 for MySQL SSL connection with Ranger KMS.

recordserver-env

Change values in the EMR RecordServer environment.

recordserver-conf

Change values in EMR RecordServer's erver.properties file.

recordserver-log4j

Change values in EMR RecordServer's log4j.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.29.0

5.29.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

5.29.0 Release Notes

The following release notes include information for Amazon EMR release version 5.29.0. Changes are relative to 5.28.1.

Initial release date: Jan 17, 2020

Upgrades

  • Upgraded AWS Java SDK to version 1.11.682

  • Upgraded Hive to version 2.3.6

  • Upgraded Flink to version 1.9.1

  • Upgraded EmrFS to version 2.38.0

  • Upgraded EMR DynamoDB Connector to version 4.13.0

Changes, Enhancements, and Resolved Issues

  • Spark

    • Spark performance optimizations.

  • EMRFS

    • Management Guide updates to emrfs-site.xml default settings for consistent view.

Known Issues

  • Known issue in clusters with multiple master nodes and Kerberos authentication

    If you run clusters with multiple master nodes and Kerberos authentication in EMR releases 5.20.0 and later, you may encounter problems with cluster operations such as scale down or step submission, after the cluster has been running for some time. The time period depends on the Kerberos ticket validity period that you defined. The scale-down problem impacts both automatic scale-down and explicit scale down requests that you submitted. Additional cluster operations can also be impacted.

    Workaround:

    • SSH as hadoop user to the lead master node of the EMR cluster with multiple master nodes.

    • Run the following command to renew Kerberos ticket for hadoop user.

      kinit -kt <keytab_file> <principal>

      Typically, the keytab file is located at /etc/hadoop.keytab and the principal is in the form of hadoop/<hostname>@<REALM>.

    Note

    This workaround will be effective for the time period the Kerberos ticket is valid. This duration is 10 hours by default, but can configured by your Kerberos settings. You must re-run the above command once the Kerberos ticket expires.

5.29.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components in Amazon EMR differ from community versions. These components have a version label in the form CommunityVersion-amzn-EmrVersion. The EmrVersion starts at 0. For example, if open source community component named myapp-component with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-2.

Component Version Description
aws-sagemaker-spark-sdk 1.2.6 Amazon SageMaker Spark SDK
emr-ddb 4.13.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.12.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.13.0 Distributed copy application optimized for Amazon S3.
emr-s3-select 1.4.0 EMR S3Select Connector
emrfs 2.38.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.9.1 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.5-amzn-5 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.5-amzn-5 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.5-amzn-5 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.5-amzn-5 HDFS service for tracking file names and block locations.
hadoop-hdfs-journalnode 2.8.5-amzn-5 HDFS service for managing the Hadoop filesystem journal on HA clusters.
hadoop-httpfs-server 2.8.5-amzn-5 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.5-amzn-5 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.5-amzn-5 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.5-amzn-5 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.5-amzn-5 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.5-amzn-5 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.10 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.10 Service for serving one or more HBase regions.
hbase-client 1.4.10 HBase command-line client.
hbase-rest-server 1.4.10 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.10 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.6-amzn-1 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.6-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.6-amzn-1 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.6-amzn-1 Hive command line client.
hive-hbase 2.3.6-amzn-1 Hive-hbase client.
hive-metastore-server 2.3.6-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.6-amzn-1 Service for accepting Hive queries as web requests.
hudi 0.5.0-incubating Incremental processing framework to power data pipline at low latency and high efficiency.
hudi-presto 0.5.0-incubating Bundle library for running Presto with Hudi.
hue-server 4.4.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 1.0.0 Multi-user server for Jupyter notebooks
livy-server 0.6.0-incubating REST interface for interacting with Apache Spark
nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server
mahout-client 0.13.0 Library for machine learning.
mxnet 1.5.1 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit
oozie-client 5.1.0 Oozie command-line client.
oozie-server 5.1.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.14.3-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.3-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.227 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.227 Service for executing pieces of a query.
presto-client 0.227 Presto command-line client which is installed on an HA cluster's stand-by masters where Presto server is not started.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.4.4 Spark command-line clients.
spark-history-server 2.4.4 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.4.4 In-memory execution engine for YARN.
spark-yarn-slave 2.4.4 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tensorflow 1.14.0 TensorFlow open source software library for high performance numerical computation.
tez-on-yarn 0.9.2 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.8.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.14 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.14 ZooKeeper command line client.

5.29.0 Configuration Classifications

Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.29.0 Classifications
Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-s3-conf

Configure Jupyter Notebook S3 persistence.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-memory

Change values in Presto's memory.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

presto-connector-tpcds

Change values in Presto's tpcds.properties file.

ranger-kms-dbks-site

Change values in dbks-site.xml file of Ranger KMS.

ranger-kms-site

Change values in ranger-kms-site.xml file of Ranger KMS.

ranger-kms-env

Change values in the Ranger KMS environment.

ranger-kms-log4j

Change values in kms-log4j.properties file of Ranger KMS.

ranger-kms-db-ca

Change values for CA file on S3 for MySQL SSL connection with Ranger KMS.

recordserver-env

Change values in the EMR RecordServer environment.

recordserver-conf

Change values in EMR RecordServer's erver.properties file.

recordserver-log4j

Change values in EMR RecordServer's log4j.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.28.x

There are multiple releases within the 5.28 series. Choose a link below to see information for a specific release within this tab.

5.28.1 (Latest) | 5.28.0

Amazon EMR Release 5.28.1

Release 5.28.1 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.28.1 Release Notes

The following release notes include information for Amazon EMR release version 5.28.1. Changes are relative to 5.28.0.

Initial release date: Jan 10, 2020

Changes, Enhancements, and Resolved Issues

  • Spark

    • Fixed Spark compatibility issues.

  • CloudWatch Metrics

    • Fixed Amazon CloudWatch Metrics publishing on an EMR cluster with multiple master nodes.

  • Disabled log message

    • Disabled false log message, "...using old version (<4.5.8) of Apache http client."

Known Issues

  • Known issue in clusters with multiple master nodes and Kerberos authentication

    If you run clusters with multiple master nodes and Kerberos authentication in EMR releases 5.20.0 and later, you may encounter problems with cluster operations such as scale down or step submission, after the cluster has been running for some time. The time period depends on the Kerberos ticket validity period that you defined. The scale-down problem impacts both automatic scale-down and explicit scale down requests that you submitted. Additional cluster operations can also be impacted.

    Workaround:

    • SSH as hadoop user to the lead master node of the EMR cluster with multiple master nodes.

    • Run the following command to renew Kerberos ticket for hadoop user.

      kinit -kt <keytab_file> <principal>

      Typically, the keytab file is located at /etc/hadoop.keytab and the principal is in the form of hadoop/<hostname>@<REALM>.

    Note

    This workaround will be effective for the time period the Kerberos ticket is valid. This duration is 10 hours by default, but can configured by your Kerberos settings. You must re-run the above command once the Kerberos ticket expires.

Release 5.28.1 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components in Amazon EMR differ from community versions. These components have a version label in the form CommunityVersion-amzn-EmrVersion. The EmrVersion starts at 0. For example, if open source community component named myapp-component with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-2.

Component Version Description
aws-sagemaker-spark-sdk 1.2.6 Amazon SageMaker Spark SDK
emr-ddb 4.12.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.11.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.13.0 Distributed copy application optimized for Amazon S3.
emr-s3-select 1.3.0 EMR S3Select Connector
emrfs 2.37.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.9.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.5-amzn-5 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.5-amzn-5 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.5-amzn-5 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.5-amzn-5 HDFS service for tracking file names and block locations.
hadoop-hdfs-journalnode 2.8.5-amzn-5 HDFS service for managing the Hadoop filesystem journal on HA clusters.
hadoop-httpfs-server 2.8.5-amzn-5 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.5-amzn-5 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.5-amzn-5 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.5-amzn-5 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.5-amzn-5 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.5-amzn-5 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.10 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.10 Service for serving one or more HBase regions.
hbase-client 1.4.10 HBase command-line client.
hbase-rest-server 1.4.10 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.10 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.6-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.6-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.6-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.6-amzn-0 Hive command line client.
hive-hbase 2.3.6-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.6-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.6-amzn-0 Service for accepting Hive queries as web requests.
hudi 0.5.0-incubating Incremental processing framework to power data pipline at low latency and high efficiency.
hudi-presto 0.5.0-incubating Bundle library for running Presto with Hudi.
hue-server 4.4.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 1.0.0 Multi-user server for Jupyter notebooks
livy-server 0.6.0-incubating REST interface for interacting with Apache Spark
nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server
mahout-client 0.13.0 Library for machine learning.
mxnet 1.5.1 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit
oozie-client 5.1.0 Oozie command-line client.
oozie-server 5.1.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.14.3-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.3-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.227 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.227 Service for executing pieces of a query.
presto-client 0.227 Presto command-line client which is installed on an HA cluster's stand-by masters where Presto server is not started.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.4.4 Spark command-line clients.
spark-history-server 2.4.4 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.4.4 In-memory execution engine for YARN.
spark-yarn-slave 2.4.4 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tensorflow 1.14.0 TensorFlow open source software library for high performance numerical computation.
tez-on-yarn 0.9.2 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.8.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.14 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.14 ZooKeeper command line client.

Release 5.28.1 Configuration Classifications

Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.28.1 Classifications
Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-s3-conf

Configure Jupyter Notebook S3 persistence.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-memory

Change values in Presto's memory.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

presto-connector-tpcds

Change values in Presto's tpcds.properties file.

ranger-kms-dbks-site

Change values in dbks-site.xml file of Ranger KMS.

ranger-kms-site

Change values in ranger-kms-site.xml file of Ranger KMS.

ranger-kms-env

Change values in the Ranger KMS environment.

ranger-kms-log4j

Change values in kms-log4j.properties file of Ranger KMS.

ranger-kms-db-ca

Change values for CA file on S3 for MySQL SSL connection with Ranger KMS.

recordserver-env

Change values in the EMR RecordServer environment.

recordserver-conf

Change values in EMR RecordServer's erver.properties file.

recordserver-log4j

Change values in EMR RecordServer's log4j.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

Amazon EMR Release 5.28.0

Release 5.28.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.28.0 Release Notes

The following release notes include information for Amazon EMR release version 5.28.0. Changes are relative to 5.27.0.

Initial release date: Nov 12, 2019

Upgrades

  • Upgraded Flink to version 1.9.0

  • Upgraded Hive to version 2.3.6

  • Upgraded MXNet to version 1.5.1

  • Upgraded Phoenix to version 4.14.3

  • Upgraded Presto to version 0.227

  • Upgraded Zeppelin to version 0.8.2

New Features

  • Apache Hudi is now available for Amazon EMR to install when you create a cluster. For more information, see Hudi.

  • (Nov 25, 2019) You can now choose to run multiple steps in parallel to improve cluster utilization and save cost. You can also cancel both pending and running steps. For more information, see Work with Steps Using the AWS CLI and Console.

  • (Dec 3, 2019) You can now create and run EMR clusters on AWS Outposts. AWS Outposts enables native AWS services, infrastructure, and operating models in on-premises facilities. In AWS Outposts environments, you can use the same AWS APIs, tools, and infrastructure that you use in the AWS cloud. For more information, see EMR Clusters on AWS Outposts.

  • (Mar 11, 2020) Beginning with Amazon EMR version 5.28.0, you can create and run Amazon EMR clusters on an AWS Local Zones subnet as a logical extension of an AWS Region that supports Local Zones. A Local Zone enables Amazon EMR features and a subset of AWS services, like compute and storage services, to be located closer to users, providing very low latency access to applications running locally. For a list of available Local Zones, see AWS Local Zones. For information about accessing available AWS Local Zones, see Regions, Availability Zones, and Local Zones.

    Local Zones don’t currently support Amazon EMR Notebooks and do not support connections directly to Amazon EMR using interface VPC endpoint (AWS PrivateLink).

Changes, Enhancements, and Resolved Issues

Known Issues

  • Known issue in clusters with multiple master nodes and Kerberos authentication

    If you run clusters with multiple master nodes and Kerberos authentication in EMR releases 5.20.0 and later, you may encounter problems with cluster operations such as scale down or step submission, after the cluster has been running for some time. The time period depends on the Kerberos ticket validity period that you defined. The scale-down problem impacts both automatic scale-down and explicit scale down requests that you submitted. Additional cluster operations can also be impacted.

    Workaround:

    • SSH as hadoop user to the lead master node of the EMR cluster with multiple master nodes.

    • Run the following command to renew Kerberos ticket for hadoop user.

      kinit -kt <keytab_file> <principal>

      Typically, the keytab file is located at /etc/hadoop.keytab and the principal is in the form of hadoop/<hostname>@<REALM>.

    Note

    This workaround will be effective for the time period the Kerberos ticket is valid. This duration is 10 hours by default, but can configured by your Kerberos settings. You must re-run the above command once the Kerberos ticket expires.

Release 5.28.0 Component Versions

Component Version Description
aws-sagemaker-spark-sdk 1.2.6 Amazon SageMaker Spark SDK
emr-ddb 4.12.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.11.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.13.0 Distributed copy application optimized for Amazon S3.
emr-s3-select 1.3.0 EMR S3Select Connector
emrfs 2.37.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.9.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.5-amzn-5 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.5-amzn-5 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.5-amzn-5 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.5-amzn-5 HDFS service for tracking file names and block locations.
hadoop-hdfs-journalnode 2.8.5-amzn-5 HDFS service for managing the Hadoop filesystem journal on HA clusters.
hadoop-httpfs-server 2.8.5-amzn-5 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.5-amzn-5 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.5-amzn-5 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.5-amzn-5 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.5-amzn-5 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.5-amzn-5 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.10 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.10 Service for serving one or more HBase regions.
hbase-client 1.4.10 HBase command-line client.
hbase-rest-server 1.4.10 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.10 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.6-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.6-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.6-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.6-amzn-0 Hive command line client.
hive-hbase 2.3.6-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.6-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.6-amzn-0 Service for accepting Hive queries as web requests.
hudi 0.5.0-incubating Incremental processing framework to power data pipline at low latency and high efficiency.
hudi-presto 0.5.0-incubating Bundle library for running Presto with Hudi.
hue-server 4.4.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 1.0.0 Multi-user server for Jupyter notebooks
livy-server 0.6.0-incubating REST interface for interacting with Apache Spark
nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server
mahout-client 0.13.0 Library for machine learning.
mxnet 1.5.1 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit
oozie-client 5.1.0 Oozie command-line client.
oozie-server 5.1.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.14.3-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.3-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.227 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.227 Service for executing pieces of a query.
presto-client 0.227 Presto command-line client which is installed on an HA cluster's stand-by masters where Presto server is not started.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.4.4 Spark command-line clients.
spark-history-server 2.4.4 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.4.4 In-memory execution engine for YARN.
spark-yarn-slave 2.4.4 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tensorflow 1.14.0 TensorFlow open source software library for high performance numerical computation.
tez-on-yarn 0.9.2 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.8.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.14 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.14 ZooKeeper command line client.

Release 5.28.0 Configuration Classifications

Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.28.0 Classifications
Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-s3-conf

Configure Jupyter Notebook S3 persistence.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-memory

Change values in Presto's memory.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

presto-connector-tpcds

Change values in Presto's tpcds.properties file.

ranger-kms-dbks-site

Change values in dbks-site.xml file of Ranger KMS.

ranger-kms-site

Change values in ranger-kms-site.xml file of Ranger KMS.

ranger-kms-env

Change values in the Ranger KMS environment.

ranger-kms-log4j

Change values in kms-log4j.properties file of Ranger KMS.

ranger-kms-db-ca

Change values for CA file on S3 for MySQL SSL connection with Ranger KMS.

recordserver-env

Change values in the EMR RecordServer environment.

recordserver-conf

Change values in EMR RecordServer's erver.properties file.

recordserver-log4j

Change values in EMR RecordServer's log4j.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.27.0

5.27.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

5.27.0 Release Notes

The following release notes include information for Amazon EMR release version 5.27.0. Changes are relative to 5.26.0.

Initial release date: Sep 23, 2019

Upgrades

  • AWS SDK for Java 1.11.615

  • Flink 1.8.1

  • JupyterHub 1.0.0

  • Spark 2.4.4

  • Tensorflow 1.14.0

  • Connectors and drivers:

    • DynamoDB Connector 4.12.0

New Features

  • (Oct 24, 2019) The following new features in EMR notebooks are available with all Amazon EMR releases.

    • You can now associate Git repositories with EMR notebooks to store your notebooks in a version controlled environment. You can share code with peers and reuse existing Jupyter notebooks through remote Git repositories. For more information, see Associate Git Repositories with Amazon EMR Notebooks in the Amazon EMR Management Guide.

    • The nbdime utility is now available in EMR notebooks to simplify comparing and merging notebooks.  

    • EMR notebooks now support JupyterLab. JupyterLab is a web-based interactive development environment fully compatible with Jupyter notebooks. You can now choose to open your notebook in either JupyterLab or Jupyter notebook editor. 

  • (Oct 30, 2019) With Amazon EMR versions 5.25.0 and later, you can connect to Spark history server UI from the cluster Summary page or the Application history tab in the console. Instead of setting up a web proxy through an SSH connection, you can quickly access the Spark history server UI to view application metrics and access relevant log files for active and terminated clusters. For more information, see Off-cluster access to persistent application user interfaces in the Amazon EMR Management Guide.

Changes, Enhancements, and Resolved Issues

Known Issues

  • Known issue in clusters with multiple master nodes and Kerberos authentication

    If you run clusters with multiple master nodes and Kerberos authentication in EMR releases 5.20.0 and later, you may encounter problems with cluster operations such as scale down or step submission, after the cluster has been running for some time. The time period depends on the Kerberos ticket validity period that you defined. The scale-down problem impacts both automatic scale-down and explicit scale down requests that you submitted. Additional cluster operations can also be impacted.

    Workaround:

    • SSH as hadoop user to the lead master node of the EMR cluster with multiple master nodes.

    • Run the following command to renew Kerberos ticket for hadoop user.

      kinit -kt <keytab_file> <principal>

      Typically, the keytab file is located at /etc/hadoop.keytab and the principal is in the form of hadoop/<hostname>@<REALM>.

    Note

    This workaround will be effective for the time period the Kerberos ticket is valid. This duration is 10 hours by default, but can configured by your Kerberos settings. You must re-run the above command once the Kerberos ticket expires.

5.27.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components in Amazon EMR differ from community versions. These components have a version label in the form CommunityVersion-amzn-EmrVersion. The EmrVersion starts at 0. For example, if open source community component named myapp-component with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-2.

Component Version Description
aws-sagemaker-spark-sdk 1.2.4 Amazon SageMaker Spark SDK
emr-ddb 4.12.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.11.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.13.0 Distributed copy application optimized for Amazon S3.
emr-s3-select 1.3.0 EMR S3Select Connector
emrfs 2.36.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.8.1 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.5-amzn-4 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.5-amzn-4 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.5-amzn-4 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.5-amzn-4 HDFS service for tracking file names and block locations.
hadoop-hdfs-journalnode 2.8.5-amzn-4 HDFS service for managing the Hadoop filesystem journal on HA clusters.
hadoop-httpfs-server 2.8.5-amzn-4 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.5-amzn-4 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.5-amzn-4 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.5-amzn-4 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.5-amzn-4 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.5-amzn-4 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.10 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.10 Service for serving one or more HBase regions.
hbase-client 1.4.10 HBase command-line client.
hbase-rest-server 1.4.10 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.10 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.5-amzn-1 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.5-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.5-amzn-1 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.5-amzn-1 Hive command line client.
hive-hbase 2.3.5-amzn-1 Hive-hbase client.
hive-metastore-server 2.3.5-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.5-amzn-1 Service for accepting Hive queries as web requests.
hue-server 4.4.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 1.0.0 Multi-user server for Jupyter notebooks
livy-server 0.6.0-incubating REST interface for interacting with Apache Spark
nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server
mahout-client 0.13.0 Library for machine learning.
mxnet 1.4.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit
oozie-client 5.1.0 Oozie command-line client.
oozie-server 5.1.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.14.2-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.2-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.224 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.224 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.4.4 Spark command-line clients.
spark-history-server 2.4.4 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.4.4 In-memory execution engine for YARN.
spark-yarn-slave 2.4.4 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tensorflow 1.14.0 TensorFlow open source software library for high performance numerical computation.
tez-on-yarn 0.9.2 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.8.1 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.14 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.14 ZooKeeper command line client.

5.27.0 Configuration Classifications

Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.27.0 Classifications
Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-s3-conf

Configure Jupyter Notebook S3 persistence.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-memory

Change values in Presto's memory.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

presto-connector-tpcds

Change values in Presto's tpcds.properties file.

ranger-kms-dbks-site

Change values in dbks-site.xml file of Ranger KMS.

ranger-kms-site

Change values in ranger-kms-site.xml file of Ranger KMS.

ranger-kms-env

Change values in the Ranger KMS environment.

ranger-kms-log4j

Change values in kms-log4j.properties file of Ranger KMS.

ranger-kms-db-ca

Change values for CA file on S3 for MySQL SSL connection with Ranger KMS.

recordserver-env

Change values in the EMR RecordServer environment.

recordserver-conf

Change values in EMR RecordServer's erver.properties file.

recordserver-log4j

Change values in EMR RecordServer's log4j.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.26.0

5.26.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

5.26.0 Release Notes

The following release notes include information for Amazon EMR release version 5.26.0. Changes are relative to 5.25.0.

Initial release date: Aug 8, 2019

Last updated date: Aug 19, 2019

Upgrades

  • AWS SDK for Java 1.11.595

  • HBase 1.4.10

  • Phoenix 4.14.2

  • Connectors and drivers:

    • DynamoDB Connector 4.11.0

    • MariaDB Connector 2.4.2

    • Amazon Redshift JDBC Driver 1.2.32.1056

New Features

  • (Beta) With Amazon EMR 5.26.0, you can launch a cluster that integrates with Lake Formation. This integration provides fine-grained, column-level access to databases and tables in the AWS Glue Data Catalog. It also enables federated single sign-on to EMR Notebooks or Apache Zeppelin from an enterprise identity system. For more information, see Integrating Amazon EMR with AWS Lake Formation (Beta).

  • (Aug 19, 2019) Amazon EMR block public access is now available with all Amazon EMR releases that support security groups. Block public access is an account-wide setting applied to each AWS Region. Block public access prevents a cluster from launching when any security group associated with the cluster has a rule that allows inbound traffic from IPv4 0.0.0.0/0 or IPv6 ::/0 (public access) on a port, unless a port is specified as an exception. Port 22 is an exception by default. For more information, see Using Amazon EMR Block Public Access in the Amazon EMR Management Guide.

Changes, Enhancements, and Resolved Issues

  • EMR Notebooks

    • With EMR 5.26.0 and later, EMR Notebooks supports notebook-scoped Python libraries in addition to the default Python libraries. You can install notebook-scoped libraries from within the notebook editor without having to re-create a cluster or re-attach a notebook to a cluster. Notebook-scoped libraries are created in a Python virtual environment, so they apply only to the current notebook session. This allows you to isolate notebook dependencies. For more information, see Using Notebook Scoped Libraries in the Amazon EMR Management Guide.

  • EMRFS

    • You can enable an ETag verification feature (Beta) by setting fs.s3.consistent.metadata.etag.verification.enabled to true. With this feature, EMRFS uses Amazon S3 ETags to verify that objects being read are the latest available version. This feature is helpful for read-after-update use cases in which files on Amazon S3 are overwritten while retaining the same name. This ETag verification capability currently does not work with S3 Select. For more information, see Configure Consistent View.

  • Spark

    • The following optimizations are now enabled by default: dynamic partition pruning, DISTINCT before INTERSECT, improvements in SQL plan statistics inference for JOIN followed by DISTINCT queries, flattening scalar subqueries, optimized join reorder, and bloom filter join. For more information, see Optimizing Spark Performance.

    • Improved whole stage code generation for Sort Merge Join.

    • Improved query fragment and subquery reuse.

    • Improvements to pre-allocate executors on Spark start up.

    • Bloom filter joins are no longer applied when the smaller side of the join includes a broadcast hint.

  • Tez

    • Resolved an issue with Tez. Tez UI now works on an EMR cluster with multiple master nodes.

Known Issues

  • The improved whole stage code generation capabilities for Sort Merge Join can increase memory pressure when enabled. This optimization improves performance, but may result in job retries or failures if the spark.yarn.executor.memoryOverheadFactor is not tuned to provide enough memory. To disable this feature, set spark.sql.sortMergeJoinExec.extendedCodegen.enabled to false.

  • Known issue in clusters with multiple master nodes and Kerberos authentication

    If you run clusters with multiple master nodes and Kerberos authentication in EMR releases 5.20.0 and later, you may encounter problems with cluster operations such as scale down or step submission, after the cluster has been running for some time. The time period depends on the Kerberos ticket validity period that you defined. The scale-down problem impacts both automatic scale-down and explicit scale down requests that you submitted. Additional cluster operations can also be impacted.

    Workaround:

    • SSH as hadoop user to the lead master node of the EMR cluster with multiple master nodes.

    • Run the following command to renew Kerberos ticket for hadoop user.

      kinit -kt <keytab_file> <principal>

      Typically, the keytab file is located at /etc/hadoop.keytab and the principal is in the form of hadoop/<hostname>@<REALM>.

    Note

    This workaround will be effective for the time period the Kerberos ticket is valid. This duration is 10 hours by default, but can configured by your Kerberos settings. You must re-run the above command once the Kerberos ticket expires.

5.26.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components in Amazon EMR differ from community versions. These components have a version label in the form CommunityVersion-amzn-EmrVersion. The EmrVersion starts at 0. For example, if open source community component named myapp-component with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-2.

Component Version Description
aws-sagemaker-spark-sdk 1.2.4 Amazon SageMaker Spark SDK
emr-ddb 4.11.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.10.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.12.0 Distributed copy application optimized for Amazon S3.
emr-s3-select 1.3.0 EMR S3Select Connector
emrfs 2.35.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.8.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.5-amzn-4 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.5-amzn-4 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.5-amzn-4 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.5-amzn-4 HDFS service for tracking file names and block locations.
hadoop-hdfs-journalnode 2.8.5-amzn-4 HDFS service for managing the Hadoop filesystem journal on HA clusters.
hadoop-httpfs-server 2.8.5-amzn-4 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.5-amzn-4 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.5-amzn-4 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.5-amzn-4 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.5-amzn-4 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.5-amzn-4 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.10 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.10 Service for serving one or more HBase regions.
hbase-client 1.4.10 HBase command-line client.
hbase-rest-server 1.4.10 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.10 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.5-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.5-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.5-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.5-amzn-0 Hive command line client.
hive-hbase 2.3.5-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.5-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.5-amzn-0 Service for accepting Hive queries as web requests.
hue-server 4.4.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 0.9.6 Multi-user server for Jupyter notebooks
livy-server 0.6.0-incubating REST interface for interacting with Apache Spark
nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server
mahout-client 0.13.0 Library for machine learning.
mxnet 1.4.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit
oozie-client 5.1.0 Oozie command-line client.
oozie-server 5.1.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.14.2-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.2-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.220 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.220 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.4.3 Spark command-line clients.
spark-history-server 2.4.3 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.4.3 In-memory execution engine for YARN.
spark-yarn-slave 2.4.3 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tensorflow 1.13.1 TensorFlow open source software library for high performance numerical computation.
tez-on-yarn 0.9.2 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.8.1 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.14 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.14 ZooKeeper command line client.

5.26.0 Configuration Classifications

Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.26.0 Classifications
Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-s3-conf

Configure Jupyter Notebook S3 persistence.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-memory

Change values in Presto's memory.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

presto-connector-tpcds

Change values in Presto's tpcds.properties file.

recordserver-env

Change values in the EMR RecordServer environment.

recordserver-conf

Change values in EMR RecordServer's erver.properties file.

recordserver-log4j

Change values in EMR RecordServer's log4j.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.25.0

5.25.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

5.25.0 Release Notes

The following release notes include information for Amazon EMR release version 5.25.0. Changes are relative to 5.24.1.

Initial release date: July 17, 2019

Last updated date: Oct 30, 2019

Amazon EMR 5.25.0

Upgrades

  • AWS SDK for Java 1.11.566

  • Hive 2.3.5

  • Presto 0.220

  • Spark 2.4.3

  • TensorFlow 1.13.1

  • Tez 0.9.2

  • Zookeeper 3.4.14

New Features

  • (Oct 30, 2019) Beginning with Amazon EMR version 5.25.0, you can connect to Spark history server UI from the cluster Summary page or the Application history tab in the console. Instead of setting up a web proxy through an SSH connection, you can quickly access the Spark history server UI to view application metrics and access relevant log files for active and terminated clusters. For more information, see Off-cluster access to persistent application user interfaces in the Amazon EMR Management Guide.

Changes, Enhancements, and Resolved Issues

  • Spark

    • Improved the performance of some joins by using Bloom filters to pre-filter inputs. The optimization is disabled by default and can be enabled by setting the Spark configuration parameter spark.sql.bloomFilterJoin.enabled to true.

    • Improved the performance of grouping by string type columns.

    • Improved the default Spark executor memory and cores configuration of R4 instance types for clusters without HBase installed.

    • Resolved a previous issue with the dynamic partition pruning feature where the pruned table has to be on the left side of the join.

    • Improved DISTINCT before INTERSECT optimization to apply to additional cases involving aliases.

    • Improved SQL plan statistics inference for JOIN followed by DISTINCT queries. This improvement is disabled by default and can be enabled by setting the Spark configuration parameter spark.sql.statsImprovements.enabled to true. This optimization is required by the Distinct before Intersect feature and will be enabled automatically when spark.sql.optimizer.distinctBeforeIntersect.enabled is set to true.

    • Optimized join order based on table size and filters. This optimization is disabled by default and can be enabled by setting the Spark configuration parameter spark.sql.optimizer.sizeBasedJoinReorder.enabled to true.

    For more information, see Optimizing Spark Performance.

  • EMRFS

    • The EMRFS setting, fs.s3.buckets.create.enabled, is now disabled by default. With testing, we found that disabling this setting improves performance and prevents unintentional creation of S3 buckets. If your application relies on this functionality, you can enable it by setting the property fs.s3.buckets.create.enabled to true in the emrfs-site configuration classification. For information, see Supplying a Configuration when Creating a Cluster.

  • Local Disk Encryption and S3 Encryption Improvements in Security Configurations (August 5, 2019)

    • Separated Amazon S3 encryption settings from local disk encryption settings in security configuration setup.

    • Added an option to enable EBS encryption with release 5.24.0 and later. Selecting this option encrypts the root device volume in addition to storage volumes. Previous versions required using a custom AMI to encrypt the root device volume.

    • For more information, see Encryption Options in the Amazon EMR Management Guide.

Known Issues

  • Known issue in clusters with multiple master nodes and Kerberos authentication

    If you run clusters with multiple master nodes and Kerberos authentication in EMR releases 5.20.0 and later, you may encounter problems with cluster operations such as scale down or step submission, after the cluster has been running for some time. The time period depends on the Kerberos ticket validity period that you defined. The scale-down problem impacts both automatic scale-down and explicit scale down requests that you submitted. Additional cluster operations can also be impacted.

    Workaround:

    • SSH as hadoop user to the lead master node of the EMR cluster with multiple master nodes.

    • Run the following command to renew Kerberos ticket for hadoop user.

      kinit -kt <keytab_file> <principal>

      Typically, the keytab file is located at /etc/hadoop.keytab and the principal is in the form of hadoop/<hostname>@<REALM>.

    Note

    This workaround will be effective for the time period the Kerberos ticket is valid. This duration is 10 hours by default, but can configured by your Kerberos settings. You must re-run the above command once the Kerberos ticket expires.

5.25.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components in Amazon EMR differ from community versions. These components have a version label in the form CommunityVersion-amzn-EmrVersion. The EmrVersion starts at 0. For example, if open source community component named myapp-component with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-2.

Component Version Description
aws-sagemaker-spark-sdk 1.2.4 Amazon SageMaker Spark SDK
emr-ddb 4.10.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.9.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.11.0 Distributed copy application optimized for Amazon S3.
emr-s3-select 1.3.0 EMR S3Select Connector
emrfs 2.34.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.8.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.5-amzn-4 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.5-amzn-4 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.5-amzn-4 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.5-amzn-4 HDFS service for tracking file names and block locations.
hadoop-hdfs-journalnode 2.8.5-amzn-4 HDFS service for managing the Hadoop filesystem journal on HA clusters.
hadoop-httpfs-server 2.8.5-amzn-4 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.5-amzn-4 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.5-amzn-4 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.5-amzn-4 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.5-amzn-4 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.5-amzn-4 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.9 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.9 Service for serving one or more HBase regions.
hbase-client 1.4.9 HBase command-line client.
hbase-rest-server 1.4.9 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.9 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.5-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.5-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.5-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.5-amzn-0 Hive command line client.
hive-hbase 2.3.5-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.5-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.5-amzn-0 Service for accepting Hive queries as web requests.
hue-server 4.4.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 0.9.6 Multi-user server for Jupyter notebooks
livy-server 0.6.0-incubating REST interface for interacting with Apache Spark
nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server
mahout-client 0.13.0 Library for machine learning.
mxnet 1.4.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit
oozie-client 5.1.0 Oozie command-line client.
oozie-server 5.1.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.14.1-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.1-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.220 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.220 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.4.3 Spark command-line clients.
spark-history-server 2.4.3 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.4.3 In-memory execution engine for YARN.
spark-yarn-slave 2.4.3 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tensorflow 1.13.1 TensorFlow open source software library for high performance numerical computation.
tez-on-yarn 0.9.2 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.8.1 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.14 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.14 ZooKeeper command line client.

5.25.0 Configuration Classifications

Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.25.0 Classifications
Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-s3-conf

Configure Jupyter Notebook S3 persistence.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-memory

Change values in Presto's memory.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis