Menu
Amazon EMR
Amazon EMR Release Guide

Amazon EMR 5.x Release Versions

Each tab below lists application versions, release notes, component versions, and configuration classifications available in each Amazon EMR 5.x release version.

For a comprehensive diagram of application versions in every release, see Application Versions in Amazon EMR 5.x Releases (PNG).

When you launch a cluster, you can choose from multiple release versions of Amazon EMR. This allows you to test and use application versions that fit your compatibility requirements. You specify the release version using the release label. Release labels are in the form emr-x.x.x. For example, emr-5.16.0.

5.16.05.15.05.14.05.13.05.12.x5.11.x5.10.05.9.05.8.x5.7.05.6.05.5.x5.4.05.3.05.2.x5.1.05.0.x
5.16.0

5.16.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

5.16.0 Release Notes

The following release notes include information for Amazon EMR release version 5.16.0. Changes are relative to 5.15.0.

Initial release date: July 19, 2018

Upgrades

  • Hadoop 2.8.4

  • Flink 1.5.0

  • Livy 0.5.0

  • MXNet 1.2.0

  • Phoenix 4.14.0

  • Presto 0.203

  • Spark 2.3.1

  • AWS SDK for Java 1.11.336

  • CUDA 9.2

  • Redshift JDBC Driver 1.2.15.1025

Changes, Enhancements, and Resolved Issues

Known Issues

  • This release version does not support the c1.medium or m1.small instance types. Clusters using either of these instance types fail to start. As a workaround, specify a different instance type or use a different release version.

5.16.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
aws-sagemaker-spark-sdk 1.1.0 Amazon SageMaker Spark SDK
emr-ddb 4.6.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3.
emrfs 2.25.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.5.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.4-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.4-amzn-0 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.4-amzn-0 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.4-amzn-0 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.8.4-amzn-0 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.4-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.4-amzn-0 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.4-amzn-0 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.4-amzn-0 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.4-amzn-0 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.4 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.4 Service for serving one or more HBase regions.
hbase-client 1.4.4 HBase command-line client.
hbase-rest-server 1.4.4 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.4 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.3-amzn-1 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.3-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.3-amzn-1 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.3-amzn-1 Hive command line client.
hive-hbase 2.3.3-amzn-1 Hive-hbase client.
hive-metastore-server 2.3.3-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.3-amzn-1 Service for accepting Hive queries as web requests.
hue-server 4.2.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 0.8.1 Multi-user server for Jupyter notebooks
livy-server 0.5.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mxnet 1.2.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 5.0.0 Oozie command-line client.
oozie-server 5.0.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.14.0-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.14.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.203 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.203 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.3.1 Spark command-line clients.
spark-history-server 2.3.1 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.3.1 In-memory execution engine for YARN.
spark-yarn-slave 2.3.1 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.12 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.12 ZooKeeper command line client.

5.16.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.16.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-password-authenticator

Change values in Presto's password-authenticator.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.15.0

5.15.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

5.15.0 Release Notes

The following release notes include information for Amazon EMR release version 5.15.0. Changes are relative to 5.14.0.

Initial release date: June 21, 2018

Upgrades

  • Upgraded HBase to 1.4.4

  • Upgraded Hive to 2.3.3

  • Upgraded Hue to 4.2.0

  • Upgraded Oozie to 5.0.0

  • Upgraded Zookeeper to 3.4.12

  • Upgraded AWS SDK to 1.11.333

Changes, Enhancements, and Resolved Issues

  • Hive

  • Hue

    • Updated Hue to correctly authenticate with Livy when Kerberos is enabled. Livy is now supported when using Kerberos with Amazon EMR.

  • JupyterHub

    • Updated JupyterHub so that Amazon EMR installs LDAP client libraries by default.

    • Fixed an error in the script that generates self-signed certificates. For more information about the issue, see Release Notes

Known Issues

  • This release version does not support the c1.medium or m1.small instance types. Clusters using either of these instance types fail to start. As a workaround, specify a different instance type or use a different release version.

5.15.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK
emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3.
emrfs 2.24.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.4.2 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.3-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.3-amzn-1 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.3-amzn-1 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.3-amzn-1 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.8.3-amzn-1 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.3-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.3-amzn-1 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.3-amzn-1 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.3-amzn-1 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.3-amzn-1 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.4 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.4 Service for serving one or more HBase regions.
hbase-client 1.4.4 HBase command-line client.
hbase-rest-server 1.4.4 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.4 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.3-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.3-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.3-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.3-amzn-0 Hive command line client.
hive-hbase 2.3.3-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.3-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.3-amzn-0 Service for accepting Hive queries as web requests.
hue-server 4.2.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 0.8.1 Multi-user server for Jupyter notebooks
livy-server 0.4.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mxnet 1.1.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 5.0.0 Oozie command-line client.
oozie-server 5.0.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.194 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.194 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.3.0 Spark command-line clients.
spark-history-server 2.3.0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.3.0 In-memory execution engine for YARN.
spark-yarn-slave 2.3.0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.12 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.12 ZooKeeper command line client.

5.15.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.15.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.14.0

5.14.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

5.14.0 Release Notes

The following release notes include information for Amazon EMR release version 5.14.0. Changes are relative to 5.13.0.

Initial release date: June 4, 2018

Upgrades

  • Upgraded Apache Flink to 1.4.2

  • Upgraded Apache MXnet to 1.1.0

  • Upgraded Apache Sqoop to 1.4.7

New Features

  • Added JupyterHub support. For more information, see JupyterHub.

Changes, Enhancements, and Resolved Issues

  • EMRFS

    • The userAgent string in requests to Amazon S3 has been updated to contain the user and group information of the invoking principal. This can be used with AWS CloudTrail logs for more comprehensive request tracking.

  • HBase

    • Included HBASE-20447, which addresses an issue that could cause cache issues, especially with split regions.

  • MXnet

    • Added OpenCV libraries.

  • Spark

    • When Spark writes Parquet files to an Amazon S3 location using EMRFS, the FileOutputCommitter algorithm has been updated to use version 2 instead of version 1. This reduces the number of renames, which improves application performance. This change does not affect:

      • Applications other than Spark.

      • Applications that write to other file systems, such as HDFS (which still use version 1 of FileOutputCommitter).

      • Applications that use other output formats, such as text or csv, that already use EMRFS direct write.

Known Issues

  • JupyterHub

    • Using configuration classifications to set up JupyterHub and individual Jupyter notebooks when you create a cluster is not supported. Edit the jupyterhub_config.py file and jupyter_notebook_config.py files for each user manually. For more information, see Configuring JupyterHub.

    • JupyterHub fails to start on clusters within a private subnet, failing with the message Error: ENOENT: no such file or directory, open '/etc/jupyter/conf/server.crt' . This is caused by an error in the script that generates self-signed certificates. Use the following workaround to generate self-signed certificates. All commands are executed while connected to the master node.

      1. Copy the certificate generation script from the container to the master node:

        sudo docker cp jupyterhub:/tmp/gen_self_signed_cert.sh ./
      2. Use a text editor to change line 23 to change public hostname to local hostname as shown below:

        local hostname=$(curl -s $EC2_METADATA_SERVICE_URI/local-hostname)
      3. Run the script to generate self-signed certificates:

        sudo bash ./gen_self_signed_cert.sh
      4. Move the certificate files that the script generates to the /etc/jupyter/conf/ directory:

        sudo mv /tmp/server.crt /tmp/server.key /etc/jupyter/conf/

      You can tail the jupyter.log file to verify that JupyterHub restarted and is returning a 200 response code. For example:

      tail -f /var/log/jupyter/jupyter.log

      This should return a response similar to the following:

      # [I 2018-06-14 18:56:51.356 JupyterHub app:1581] JupyterHub is now running at https://:9443/ # 19:01:51.359 - info: [ConfigProxy] 200 GET /api/routes

5.14.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK
emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3.
emrfs 2.23.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.4.2 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.3-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.3-amzn-1 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.3-amzn-1 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.3-amzn-1 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.8.3-amzn-1 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.3-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.3-amzn-1 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.3-amzn-1 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.3-amzn-1 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.3-amzn-1 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.2 Service for serving one or more HBase regions.
hbase-client 1.4.2 HBase command-line client.
hbase-rest-server 1.4.2 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.2 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.2-amzn-2 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.2-amzn-2 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.2-amzn-2 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.2-amzn-2 Hive command line client.
hive-hbase 2.3.2-amzn-2 Hive-hbase client.
hive-metastore-server 2.3.2-amzn-2 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.2-amzn-2 Service for accepting Hive queries as web requests.
hue-server 4.1.0 Web application for analyzing data using Hadoop ecosystem applications
jupyterhub 0.8.1 Multi-user server for Jupyter notebooks
livy-server 0.4.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mxnet 1.1.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
opencv 3.4.0 Open Source Computer Vision Library.
phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.194 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.194 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.3.0 Spark command-line clients.
spark-history-server 2.3.0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.3.0 In-memory execution engine for YARN.
spark-yarn-slave 2.3.0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.7 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

5.14.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.14.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

container-log4j

Change values in Hadoop YARN's container-log4j.properties file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

jupyter-notebook-conf

Change values in Jupyter Notebook's jupyter_notebook_config.py file.

jupyter-hub-conf

Change values in JupyterHubs's jupyterhub_config.py file.

jupyter-sparkmagic-conf

Change values in Sparkmagic's config.json file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.13.0

5.13.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

5.13.0 Release Notes

The following release notes include information for the Amazon EMR release version 5.13.0. Changes are relative to 5.12.0.

Upgrades

  • Upgraded Spark to 2.3.0

  • Upgraded HBase to 1.4.2

  • Upgraded Presto to 0.194

  • Upgraded AWS Java SDK to 1.11.297

Changes, Enhancements, and Resolved Issues

  • Hive

    • Backported HIVE-15436. Enhanced Hive APIs to return only views.

Known Issues

  • MXNet does not currently have OpenCV libraries.

5.13.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK
emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3.
emrfs 2.22.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.4.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.3-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.3-amzn-0 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.3-amzn-0 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.3-amzn-0 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.8.3-amzn-0 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.3-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.3-amzn-0 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.3-amzn-0 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.3-amzn-0 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.3-amzn-0 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.2 Service for serving one or more HBase regions.
hbase-client 1.4.2 HBase command-line client.
hbase-rest-server 1.4.2 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.2 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.2-amzn-2 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.2-amzn-2 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.2-amzn-2 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.2-amzn-2 Hive command line client.
hive-hbase 2.3.2-amzn-2 Hive-hbase client.
hive-metastore-server 2.3.2-amzn-2 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.2-amzn-2 Service for accepting Hive queries as web requests.
hue-server 4.1.0 Web application for analyzing data using Hadoop ecosystem applications
livy-server 0.4.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mxnet 1.0.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.194 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.194 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
r 3.4.1 The R Project for Statistical Computing
spark-client 2.3.0 Spark command-line clients.
spark-history-server 2.3.0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.3.0 In-memory execution engine for YARN.
spark-yarn-slave 2.3.0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

5.13.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.13.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.12.x

There are multiple releases within the 5.12 series. Choose a link below to see information for a specific release within this tab.

5.12.1 (Latest) | 5.12.0

Amazon EMR Release 5.12.1

Release 5.12.1 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.12.1 Release Notes

The following release notes include information for Amazon EMR release version 5.12.1. Changes are relative to 5.12.0.

Initial release date: March 29, 2018

Changes, Enhancements, and Resolved Issues

  • Updated the Amazon Linux kernel of the default Amazon Linux AMI for Amazon EMR to address potential vulnerabilities.

Release 5.12.1 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK
emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.9.0 Distributed copy application optimized for Amazon S3.
emrfs 2.21.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.4.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.3-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.3-amzn-0 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.3-amzn-0 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.3-amzn-0 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.8.3-amzn-0 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.3-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.3-amzn-0 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.3-amzn-0 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.3-amzn-0 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.3-amzn-0 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.0 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.0 Service for serving one or more HBase regions.
hbase-client 1.4.0 HBase command-line client.
hbase-rest-server 1.4.0 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.0 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.2-amzn-1 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.2-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.2-amzn-1 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.2-amzn-1 Hive command line client.
hive-hbase 2.3.2-amzn-1 Hive-hbase client.
hive-metastore-server 2.3.2-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.2-amzn-1 Service for accepting Hive queries as web requests.
hue-server 4.1.0 Web application for analyzing data using Hadoop ecosystem applications
livy-server 0.4.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mxnet 1.0.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.188 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.188 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
spark-client 2.2.1 Spark command-line clients.
spark-history-server 2.2.1 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.2.1 In-memory execution engine for YARN.
spark-yarn-slave 2.2.1 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.12.1 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.12.1 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

Amazon EMR Release 5.12.0

Release 5.12.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.12.0 Release Notes

The following release notes include information for the Amazon EMR release version 5.12.0. Changes are relative to 5.11.1.

Upgrades

Changes, Enhancements, and Resolved Issues

  • Hadoop

    • The yarn.resourcemanager.decommissioning.timeout property has changed to yarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs. You can use this property to customize cluster scale-down. For more information, see Cluster Scale-Down in the Amazon EMR Management Guide.

    • The Hadoop CLI added the -d option to the cp (copy) command, which specifies direct copy. You can use this to avoid creating an intermediary .COPYING file, which makes copying data between Amazon S3 faster. For more information, see HADOOP-12384.

  • Pig

    • Added the pig-env configuration classification, which simplifies the configuration of Pig environment properties. For more information, see Configuring Applications.

  • Presto

    • Added the presto-connector-redshift configuration classification, which you can use to configure values in the Presto redshift.properties configuration file. For more information, see Redshift Connector in Presto documentation, and Configuring Applications.

    • Presto support for EMRFS has been added and is the default configuration. Earlier Amazon EMR release versions used PrestoS3FileSystem, which was the only option. For more information, see EMRFS and PrestoS3FileSystem Configuration.

      Note

      A configuration issue can cause Presto errors when querying underlying data in Amazon S3 with Amazon EMR release version 5.12.0. This is because Presto fails to pick up configuration classification values from emrfs-site.xml. As a workaround, create an emrfs subdirectory under usr/lib/presto/plugin/hive-hadoop2/, create a symlink in usr/lib/presto/plugin/hive-hadoop2/emrfs to the existing /usr/share/aws/emr/emrfs/conf/emrfs-site.xml file, and then restart the presto-server process (sudo presto-server stop followed by sudo presto-server start).

  • Spark

Known Issues

  • MXNet does not include OpenCV libraries.

  • SparkR is not available for clusters created using a custom AMI because R is not installed by default on cluster nodes.

Release 5.12.0 Component Versions

Component Version Description
aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK
emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.9.0 Distributed copy application optimized for Amazon S3.
emrfs 2.21.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.4.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.8.3-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.8.3-amzn-0 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.8.3-amzn-0 HDFS command-line client and library
hadoop-hdfs-namenode 2.8.3-amzn-0 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.8.3-amzn-0 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.8.3-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.8.3-amzn-0 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.8.3-amzn-0 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.8.3-amzn-0 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.8.3-amzn-0 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.4.0 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.4.0 Service for serving one or more HBase regions.
hbase-client 1.4.0 HBase command-line client.
hbase-rest-server 1.4.0 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.4.0 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.2-amzn-1 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.2-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.2-amzn-1 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.2-amzn-1 Hive command line client.
hive-hbase 2.3.2-amzn-1 Hive-hbase client.
hive-metastore-server 2.3.2-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.2-amzn-1 Service for accepting Hive queries as web requests.
hue-server 4.1.0 Web application for analyzing data using Hadoop ecosystem applications
livy-server 0.4.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mxnet 1.0.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client
phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.188 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.188 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
spark-client 2.2.1 Spark command-line clients.
spark-history-server 2.2.1 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.2.1 In-memory execution engine for YARN.
spark-yarn-slave 2.2.1 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.12.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.12.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-env

Change values in the Pig environment.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-redshift

Change values in Presto's redshift.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.11.x

There are multiple releases within the 5.11 series. Choose a link below to see information for a specific release within this tab.

5.11.1 | 5.11.0

Amazon EMR Release 5.11.1

Release 5.11.1 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.11.1 Release Notes

The following release notes include information for the Amazon EMR 5.11.1 release. Changes are relative to the Amazon EMR 5.8.0 release.

Initial release date: January 22, 2018

Changes, Enhancements, and Resolved Issues

Release 5.11.1 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
aws-sagemaker-spark-sdk 1.0 Amazon SageMaker Spark SDK
emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.8.0 Distributed copy application optimized for Amazon S3.
emrfs 2.20.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.3.2 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.7.3-amzn-6 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.7.3-amzn-6 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.7.3-amzn-6 HDFS command-line client and library
hadoop-hdfs-namenode 2.7.3-amzn-6 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.7.3-amzn-6 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.7.3-amzn-6 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.7.3-amzn-6 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.7.3-amzn-6 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.7.3-amzn-6 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.7.3-amzn-6 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.3.1 Service for serving one or more HBase regions.
hbase-client 1.3.1 HBase command-line client.
hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.2-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.2-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.2-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.2-amzn-0 Hive command line client.
hive-hbase 2.3.2-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.2-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.2-amzn-0 Service for accepting Hive queries as web requests.
hue-server 4.0.1 Web application for analyzing data using Hadoop ecosystem applications
livy-server 0.4.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mxnet 0.12.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.11.0-HBase-1.3 The phoenix libraries for server and client
phoenix-query-server 4.11.0-HBase-1.3 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.187 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.187 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
spark-client 2.2.1 Spark command-line clients.
spark-history-server 2.2.1 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.2.1 In-memory execution engine for YARN.
spark-yarn-slave 2.2.1 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.11.1 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.11.1 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

Amazon EMR Release 5.11.0

Release 5.11.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.11.0 Release Notes

The following release notes include information for the Amazon EMR release version 5.11.0. Changes are relative to 5.10.0.

Upgrades

  • Hive 2.3.2

  • Spark 2.2.1

  • SDK for Java 1.11.238

New Features

  • Spark

    • Added spark.decommissioning.timeout.threshold setting, which improves Spark decommissioning behavior when using Spot Instances. For more information, see Configuring Node Decommissioning Behavior.

    • Added the aws-sagemaker-spark-sdk component to Spark, which installs Amazon SageMaker Spark and associated dependencies for Spark integration with Amazon SageMaker. You can use Amazon SageMaker Spark to construct Spark machine learning (ML) pipelines using Amazon SageMaker stages. For more information, see the SageMaker Spark Readme on GitHub and Using Apache Spark with Amazon SageMaker in the Amazon SageMaker Developer Guide.

Known Issues

  • MXNet does not include OpenCV libraries.

  • Hive 2.3.2 sets hive.compute.stats.using.query=true by default. This causes queries to get data from existing statistics rather than directly from data, which could be confusing. For example, if you have a table with hive.compute.stats.using.query=true and upload new files to the table LOCATION, running a SELECT COUNT(*) query on the table returns the count from the statistics, rather than picking up the added rows.

    As a workaround, use the ANALYZE TABLE command to gather new statistics, or set hive.compute.stats.using.query=false. For more information, see Statistics in Hive in the Apache Hive documentation.

Release 5.11.0 Component Versions

Component Version Description
aws-sagemaker-spark-sdk 1.0 Amazon SageMaker Spark SDK
emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.8.0 Distributed copy application optimized for Amazon S3.
emrfs 2.20.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.3.2 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.7.3-amzn-6 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.7.3-amzn-6 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.7.3-amzn-6 HDFS command-line client and library
hadoop-hdfs-namenode 2.7.3-amzn-6 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.7.3-amzn-6 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.7.3-amzn-6 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.7.3-amzn-6 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.7.3-amzn-6 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.7.3-amzn-6 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.7.3-amzn-6 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.3.1 Service for serving one or more HBase regions.
hbase-client 1.3.1 HBase command-line client.
hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.2-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.2-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.2-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.2-amzn-0 Hive command line client.
hive-hbase 2.3.2-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.2-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.2-amzn-0 Service for accepting Hive queries as web requests.
hue-server 4.0.1 Web application for analyzing data using Hadoop ecosystem applications
livy-server 0.4.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mxnet 0.12.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.11.0-HBase-1.3 The phoenix libraries for server and client
phoenix-query-server 4.11.0-HBase-1.3 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.187 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.187 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
spark-client 2.2.1 Spark command-line clients.
spark-history-server 2.2.1 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.2.1 In-memory execution engine for YARN.
spark-yarn-slave 2.2.1 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.11.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.11.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.10.0

Release 5.10.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.10.0 Release Notes

The following release notes include information for the Amazon EMR version 5.10.0 release. Changes are relative to the Amazon EMR 5.9.0 release.

Upgrades

  • AWS SDK for Java 1.11.221

  • Hive 2.3.1

  • Presto 0.187

New Features

Changes, Enhancements, and Resolved Issues

  • Presto

  • Spark

    • Backported SPARK-20640, which makes the rpc timeout and the retries for shuffle registration values configurable using spark.shuffle.registration.timeout and spark.shuffle.registration.maxAttempts properties.

    • Backported SPARK-21549, which corrects an error that occurs when writing custom OutputFormat to non-HDFS locations.

  • Backported Hadoop-13270

  • The Numpy, Scipy, and Matplotlib libraries have been removed from the base Amazon EMR AMI. If these libraries are required for your application, they are available in the application repository, so you can use a bootstrap action to install them on all nodes using yum install.

  • The Amazon EMR base AMI no longer has application RPM packages included, so the RPM packages are no longer present on cluster nodes. Custom AMIs and the Amazon EMR base AMI now reference the RPM package repository in Amazon S3.

  • Because of the introduction of per-second billing in Amazon EC2, the default Scale down behavior is now Terminate at task completion rather than Terminate at instance hour. For more information, see Configure Cluster Scale-Down.

Known Issues

  • MXNet does not include OpenCV libraries.

  • Hive 2.3.1 sets hive.compute.stats.using.query=true by default. This causes queries to get data from existing statistics rather than directly from data, which could be confusing. For example, if you have a table with hive.compute.stats.using.query=true and upload new files to the table LOCATION, running a SELECT COUNT(*) query on the table returns the count from the statistics, rather than picking up the added rows.

    As a workaround, use the ANALYZE TABLE command to gather new statistics, or set hive.compute.stats.using.query=false. For more information, see Statistics in Hive in the Apache Hive documentation.

Release 5.10.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.7.0 Distributed copy application optimized for Amazon S3.
emrfs 2.20.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.3.2 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.7.3-amzn-5 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.7.3-amzn-5 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.7.3-amzn-5 HDFS command-line client and library
hadoop-hdfs-namenode 2.7.3-amzn-5 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.7.3-amzn-5 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.7.3-amzn-5 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.7.3-amzn-5 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.7.3-amzn-5 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.7.3-amzn-5 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.7.3-amzn-5 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.3.1 Service for serving one or more HBase regions.
hbase-client 1.3.1 HBase command-line client.
hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.1-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.1-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.1-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.1-amzn-0 Hive command line client.
hive-hbase 2.3.1-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.1-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.1-amzn-0 Service for accepting Hive queries as web requests.
hue-server 4.0.1 Web application for analyzing data using Hadoop ecosystem applications
livy-server 0.4.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mxnet 0.12.0 A flexible, scalable, and efficient library for deep learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.11.0-HBase-1.3 The phoenix libraries for server and client
phoenix-query-server 4.11.0-HBase-1.3 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.187 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.187 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
spark-client 2.2.0 Spark command-line clients.
spark-history-server 2.2.0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.2.0 In-memory execution engine for YARN.
spark-yarn-slave 2.2.0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.10.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.10.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.9.0

Release 5.9.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.9.0 Release Notes

The following release notes include information for the Amazon EMR version 5.9.0 release. Changes are relative to the Amazon EMR 5.8.0 release.

Release date: October 5, 2017

Latest feature update: October 12, 2017

Upgrades

  • AWS SDK for Java version 1.11.183

  • Flink 1.3.2

  • Hue 4.0.1

  • Pig 0.17.0

  • Presto 0.184

New Features

  • Added Livy support (version 0.4.0-incubating). For more information, see Apache Livy.

  • Added support for Hue Notebook for Spark.

  • Added support for i3-series Amazon EC2 instances (October 12, 2017).

Changes, Enhancements, and Resolved Issues

  • Spark

    • Added a new set of features that help ensure Spark handles node termination because of a manual resize or an automatic scaling policy request more gracefully. For more information, see Configuring Node Decommissioning Behavior.

    • SSL is used instead of 3DES for in-transit encryption for the block transfer service, which enhances performance when using Amazon EC2 instance types with AES-NI.

    • Backported SPARK-21494.

  • Zeppelin

  • HBase

    • Added patch HBASE-18533, which allows additional values for HBase BucketCache configuration using the hbase-site configuration classification.

  • Hue

    • Added AWS Glue Data Catalog support for the Hive query editor in Hue.

    • By default, superusers in Hue can access all files that Amazon EMR IAM roles are allowed to access. Newly created users do not automatically have permissions to access the Amazon S3 filebrowser and must have the filebrowser.s3_access permissions enabled for their group.

  • Resolved an issue that caused underlying JSON data created using AWS Glue Data Catalog to be inaccessible.

Known Issues

  • Cluster launch fails when all applications are installed and the default Amazon EBS root volume size is not changed. As a workaround, use the aws emr create-cluster command from the AWS CLI and specify a larger --ebs-root-volume-size parameter.

  • Hive 2.3.0 sets hive.compute.stats.using.query=true by default. This causes queries to get data from existing statistics rather than directly from data, which could be confusing. For example, if you have a table with hive.compute.stats.using.query=true and upload new files to the table LOCATION, running a SELECT COUNT(*) query on the table returns the count from the statistics, rather than picking up the added rows.

    As a workaround, use the ANALYZE TABLE command to gather new statistics, or set hive.compute.stats.using.query=false. For more information, see Statistics in Hive in the Apache Hive documentation.

Release 5.9.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
emr-ddb 4.4.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.7.0 Distributed copy application optimized for Amazon S3.
emrfs 2.19.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.3.2 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.7.3-amzn-4 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.7.3-amzn-4 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.7.3-amzn-4 HDFS command-line client and library
hadoop-hdfs-namenode 2.7.3-amzn-4 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.7.3-amzn-4 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.7.3-amzn-4 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.7.3-amzn-4 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.7.3-amzn-4 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.7.3-amzn-4 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.7.3-amzn-4 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.3.1 Service for serving one or more HBase regions.
hbase-client 1.3.1 HBase command-line client.
hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.0-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.0-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.0-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.0-amzn-0 Hive command line client.
hive-hbase 2.3.0-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.0-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.0-amzn-0 Service for accepting Hive queries as web requests.
hue-server 4.0.1 Web application for analyzing data using Hadoop ecosystem applications
livy-server 0.4.0-incubating REST interface for interacting with Apache Spark
mahout-client 0.13.0 Library for machine learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.11.0-HBase-1.3 The phoenix libraries for server and client
phoenix-query-server 4.11.0-HBase-1.3 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.184 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.184 Service for executing pieces of a query.
pig-client 0.17.0 Pig command-line client.
spark-client 2.2.0 Spark command-line clients.
spark-history-server 2.2.0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.2.0 In-memory execution engine for YARN.
spark-yarn-slave 2.2.0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.9.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.9.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

livy-conf

Change values in Livy's livy.conf file.

livy-env

Change values in the Livy environment.

livy-log4j

Change Livy log4j.properties settings.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.8.x

There are multiple releases within the 5.8 series. Choose a link below to see information for a specific release within this tab.

5.8.2 | 5.8.1 | 5.8.0

Amazon EMR Release 5.8.2

Release 5.8.2 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Mahout, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.8.2 Release Notes

The following release notes include information for Amazon EMR release version 5.8.2. Changes are relative to 5.8.1.

Initial release date: March 29, 2018

Changes, Enhancements, and Resolved Issues

  • Updated the Amazon Linux kernel of the default Amazon Linux AMI for Amazon EMR to address potential vulnerabilities.

Release 5.8.2 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
emr-ddb 4.4.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.6.0 Distributed copy application optimized for Amazon S3.
emrfs 2.18.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.3.1 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.7.3-amzn-3 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.7.3-amzn-3 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.7.3-amzn-3 HDFS command-line client and library
hadoop-hdfs-namenode 2.7.3-amzn-3 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.7.3-amzn-3 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.7.3-amzn-3 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.7.3-amzn-3 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.7.3-amzn-3 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.7.3-amzn-3 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.7.3-amzn-3 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.3.1 Service for serving one or more HBase regions.
hbase-client 1.3.1 HBase command-line client.
hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.0-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.0-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.0-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.0-amzn-0 Hive command line client.
hive-hbase 2.3.0-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.0-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.0-amzn-0 Service for accepting Hive queries as web requests.
hue-server 3.12.0 Web application for analyzing data using Hadoop ecosystem applications
mahout-client 0.13.0 Library for machine learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.11.0-HBase-1.3 The phoenix libraries for server and client
phoenix-query-server 4.11.0-HBase-1.3 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.170 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.170 Service for executing pieces of a query.
pig-client 0.16.0-amzn-1 Pig command-line client.
spark-client 2.2.0 Spark command-line clients.
spark-history-server 2.2.0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.2.0 In-memory execution engine for YARN.
spark-yarn-slave 2.2.0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.8.2 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.8.2 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

Amazon EMR Release 5.8.1

Release 5.8.1

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Mahout, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.8.1 Release Notes

The following release notes include information for the Amazon EMR 5.8.1 release. Changes are relative to the Amazon EMR 5.8.0 release.

Initial release date: January 22, 2018

Changes, Enhancements, and Resolved Issues

Release 5.8.1 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
emr-ddb 4.4.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.6.0 Distributed copy application optimized for Amazon S3.
emrfs 2.18.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.3.1 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.7.3-amzn-3 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.7.3-amzn-3 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.7.3-amzn-3 HDFS command-line client and library
hadoop-hdfs-namenode 2.7.3-amzn-3 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.7.3-amzn-3 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.7.3-amzn-3 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.7.3-amzn-3 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.7.3-amzn-3 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.7.3-amzn-3 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.7.3-amzn-3 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.3.1 Service for serving one or more HBase regions.
hbase-client 1.3.1 HBase command-line client.
hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.0-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.0-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.0-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.0-amzn-0 Hive command line client.
hive-hbase 2.3.0-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.0-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.0-amzn-0 Service for accepting Hive queries as web requests.
hue-server 3.12.0 Web application for analyzing data using Hadoop ecosystem applications
mahout-client 0.13.0 Library for machine learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.11.0-HBase-1.3 The phoenix libraries for server and client
phoenix-query-server 4.11.0-HBase-1.3 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.170 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.170 Service for executing pieces of a query.
pig-client 0.16.0-amzn-1 Pig command-line client.
spark-client 2.2.0 Spark command-line clients.
spark-history-server 2.2.0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.2.0 In-memory execution engine for YARN.
spark-yarn-slave 2.2.0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.8.1 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.8.1 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

Amazon EMR Release 5.8.0

Release 5.8.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Mahout, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.8.0 Release Notes

The following release notes include information for the Amazon EMR version 5.8.0 release. Changes are relative to the Amazon EMR 5.7.0 release.

Initial release date: August 10, 2017

Latest feature update: September 25, 2017

Upgrades

  • AWS SDK 1.11.160

  • Flink 1.3.1

  • Hive 2.3.0. For more information, see Release Notes on the Apache Hive site.

  • Spark 2.2.0. For more information, see Release Notes on the Apache Spark site.

New Features

  • Added support for viewing application history (September 25, 2017). For more information, see Viewing Application History in the Amazon EMR Management Guide.

Changes, Enhancements, and Resolved Issues

Known Issues

  • Cluster launch fails when all applications are installed and the default Amazon EBS root volume size is not changed. As a workaround, use the aws emr create-cluster command from the AWS CLI and specify a larger --ebs-root-volume-size parameter.

  • Hive 2.3.0 sets hive.compute.stats.using.query=true by default. This causes queries to get data from existing statistics rather than directly from data, which could be confusing. For example, if you have a table with hive.compute.stats.using.query=true and upload new files to the table LOCATION, running a SELECT COUNT(*) query on the table returns the count from the statistics, rather than picking up the added rows.

    As a workaround, use the ANALYZE TABLE command to gather new statistics, or set hive.compute.stats.using.query=false. For more information, see Statistics in Hive in the Apache Hive documentation.

  • Spark—When using Spark, there is a file handler leak issue with the apppusher daemon, which can appear for a long-running Spark job after several hours or days. To fix the issue, connect to the master node and type sudo /etc/init.d/apppusher stop. This stops that apppusher daemon, which Amazon EMR will restart automatically.

  • Application history

    • Historical data for dead Spark executors is not available.

    • Application history is not available for clusters that use a security configuration to enable in-flight encryption.

Release 5.8.0 Component Versions

Component Version Description
emr-ddb 4.4.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.6.0 Distributed copy application optimized for Amazon S3.
emrfs 2.18.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.3.1 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.7.3-amzn-3 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.7.3-amzn-3 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.7.3-amzn-3 HDFS command-line client and library
hadoop-hdfs-namenode 2.7.3-amzn-3 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.7.3-amzn-3 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.7.3-amzn-3 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.7.3-amzn-3 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.7.3-amzn-3 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.7.3-amzn-3 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.7.3-amzn-3 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.3.1 Service for serving one or more HBase regions.
hbase-client 1.3.1 HBase command-line client.
hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.3.0-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.3.0-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.3.0-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.3.0-amzn-0 Hive command line client.
hive-hbase 2.3.0-amzn-0 Hive-hbase client.
hive-metastore-server 2.3.0-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.3.0-amzn-0 Service for accepting Hive queries as web requests.
hue-server 3.12.0 Web application for analyzing data using Hadoop ecosystem applications
mahout-client 0.13.0 Library for machine learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.11.0-HBase-1.3 The phoenix libraries for server and client
phoenix-query-server 4.11.0-HBase-1.3 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.170 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.170 Service for executing pieces of a query.
pig-client 0.16.0-amzn-1 Pig command-line client.
spark-client 2.2.0 Spark command-line clients.
spark-history-server 2.2.0 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.2.0 In-memory execution engine for YARN.
spark-yarn-slave 2.2.0 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.8.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.8.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.

hadoop-ssl-server

Change hadoop ssl server configuration

hadoop-ssl-client

Change hadoop ssl client configuration

hbase

Amazon EMR-curated settings for Apache HBase.

hbase-env

Change values in HBase's environment.

hbase-log4j

Change values in HBase's hbase-log4j.properties file.

hbase-metrics

Change values in HBase's hadoop-metrics2-hbase.properties file.

hbase-policy

Change values in HBase's hbase-policy.xml file.

hbase-site

Change values in HBase's hbase-site.xml file.

hdfs-encryption-zones

Configure HDFS encryption zones.

hdfs-site

Change values in HDFS's hdfs-site.xml.

hcatalog-env

Change values in HCatalog's environment.

hcatalog-server-jndi

Change values in HCatalog's jndi.properties.

hcatalog-server-proto-hive-site

Change values in HCatalog's proto-hive-site.xml.

hcatalog-webhcat-env

Change values in HCatalog WebHCat's environment.

hcatalog-webhcat-log4j2

Change values in HCatalog WebHCat's log4j2.properties.

hcatalog-webhcat-site

Change values in HCatalog WebHCat's webhcat-site.xml file.

hive-beeline-log4j2

Change values in Hive's beeline-log4j2.properties file.

hive-parquet-logging

Change values in Hive's parquet-logging.properties file.

hive-env

Change values in the Hive environment.

hive-exec-log4j2

Change values in Hive's hive-exec-log4j2.properties file.

hive-llap-daemon-log4j2

Change values in Hive's llap-daemon-log4j2.properties file.

hive-log4j2

Change values in Hive's hive-log4j2.properties file.

hive-site

Change values in Hive's hive-site.xml file

hiveserver2-site

Change values in Hive Server2's hiveserver2-site.xml file

hue-ini

Change values in Hue's ini file

httpfs-env

Change values in the HTTPFS environment.

httpfs-site

Change values in Hadoop's httpfs-site.xml file.

hadoop-kms-acls

Change values in Hadoop's kms-acls.xml file.

hadoop-kms-env

Change values in the Hadoop KMS environment.

hadoop-kms-log4j

Change values in Hadoop's kms-log4j.properties file.

hadoop-kms-site

Change values in Hadoop's kms-site.xml file.

mapred-env

Change values in the MapReduce application's environment.

mapred-site

Change values in the MapReduce application's mapred-site.xml file.

oozie-env

Change values in Oozie's environment.

oozie-log4j

Change values in Oozie's oozie-log4j.properties file.

oozie-site

Change values in Oozie's oozie-site.xml file.

phoenix-hbase-metrics

Change values in Phoenix's hadoop-metrics2-hbase.properties file.

phoenix-hbase-site

Change values in Phoenix's hbase-site.xml file.

phoenix-log4j

Change values in Phoenix's log4j.properties file.

phoenix-metrics

Change values in Phoenix's hadoop-metrics2-phoenix.properties file.

pig-properties

Change values in Pig's pig.properties file.

pig-log4j

Change values in Pig's log4j.properties file.

presto-log

Change values in Presto's log.properties file.

presto-config

Change values in Presto's config.properties file.

presto-env

Change values in Presto's presto-env.sh file.

presto-node

Change values in Presto's node.properties file.

presto-connector-blackhole

Change values in Presto's blackhole.properties file.

presto-connector-cassandra

Change values in Presto's cassandra.properties file.

presto-connector-hive

Change values in Presto's hive.properties file.

presto-connector-jmx

Change values in Presto's jmx.properties file.

presto-connector-kafka

Change values in Presto's kafka.properties file.

presto-connector-localfile

Change values in Presto's localfile.properties file.

presto-connector-mongodb

Change values in Presto's mongodb.properties file.

presto-connector-mysql

Change values in Presto's mysql.properties file.

presto-connector-postgresql

Change values in Presto's postgresql.properties file.

presto-connector-raptor

Change values in Presto's raptor.properties file.

presto-connector-redis

Change values in Presto's redis.properties file.

presto-connector-tpch

Change values in Presto's tpch.properties file.

spark

Amazon EMR-curated settings for Apache Spark.

spark-defaults

Change values in Spark's spark-defaults.conf file.

spark-env

Change values in the Spark environment.

spark-hive-site

Change values in Spark's hive-site.xml file

spark-log4j

Change values in Spark's log4j.properties file.

spark-metrics

Change values in Spark's metrics.properties file.

sqoop-env

Change values in Sqoop's environment.

sqoop-oraoop-site

Change values in Sqoop OraOop's oraoop-site.xml file.

sqoop-site

Change values in Sqoop's sqoop-site.xml file.

tez-site

Change values in Tez's tez-site.xml file.

yarn-env

Change values in the YARN environment.

yarn-site

Change values in YARN's yarn-site.xml file.

zeppelin-env

Change values in the Zeppelin environment.

zookeeper-config

Change values in ZooKeeper's zoo.cfg file.

zookeeper-log4j

Change values in ZooKeeper's log4j.properties file.

5.7.0

Release 5.7.0 Application Versions

The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Mahout, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.

The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.

For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:

Release 5.7.0 Release Notes

The following release notes include information for the Amazon EMR 5.7.0 release. Changes are relative to the Amazon EMR 5.6.0 release.

Release date: July 13, 2017

Upgrades

  • Flink 1.3.0

  • Phoenix 4.11.0

  • Zeppelin 0.7.2

New Features

  • Added the ability to specify a custom Amazon Linux AMI when you create a cluster. For more information, see Using a Custom AMI.

Changes, Enhancements, and Resolved Issues

  • HBase

  • Presto—added ability to configure node.properties.

  • YARN—added ability to configure container-log4j.properties

  • Sqoop—backported SQOOP-2880, which introduces an argument that allows you to set the Sqoop temporary directory.

Release 5.7.0 Component Versions

The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with emr or aws. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.

Some components need changes from community versions for Amazon EMR. These components have a version label in the form CommunityVersion-amzn-EmrVersion. For example, if a big-data community component named myapp-component of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as 2.2-amzn-3.

Component Version Description
emr-ddb 4.3.0 Amazon DynamoDB connector for Hadoop ecosystem applications.
emr-goodies 2.3.0 Extra convenience libraries for the Hadoop ecosystem.
emr-kinesis 3.3.0 Amazon Kinesis connector for Hadoop ecosystem applications.
emr-s3-dist-cp 2.5.0 Distributed copy application optimized for Amazon S3.
emrfs 2.18.0 Amazon S3 connector for Hadoop ecosystem applications.
flink-client 1.3.0 Apache Flink command line client scripts and applications.
ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent.
ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents.
ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector.
hadoop-client 2.7.3-amzn-2 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'.
hadoop-hdfs-datanode 2.7.3-amzn-2 HDFS node-level service for storing blocks.
hadoop-hdfs-library 2.7.3-amzn-2 HDFS command-line client and library
hadoop-hdfs-namenode 2.7.3-amzn-2 HDFS service for tracking file names and block locations.
hadoop-httpfs-server 2.7.3-amzn-2 HTTP endpoint for HDFS operations.
hadoop-kms-server 2.7.3-amzn-2 Cryptographic key management server based on Hadoop's KeyProvider API.
hadoop-mapred 2.7.3-amzn-2 MapReduce execution engine libraries for running a MapReduce application.
hadoop-yarn-nodemanager 2.7.3-amzn-2 YARN service for managing containers on an individual node.
hadoop-yarn-resourcemanager 2.7.3-amzn-2 YARN service for allocating and managing cluster resources and distributed applications.
hadoop-yarn-timeline-server 2.7.3-amzn-2 Service for retrieving current and historical information for YARN applications.
hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands.
hbase-region-server 1.3.1 Service for serving one or more HBase regions.
hbase-client 1.3.1 HBase command-line client.
hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase.
hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase.
hcatalog-client 2.1.1-amzn-0 The 'hcat' command line client for manipulating hcatalog-server.
hcatalog-server 2.1.1-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications.
hcatalog-webhcat-server 2.1.1-amzn-0 HTTP endpoint providing a REST interface to HCatalog.
hive-client 2.1.1-amzn-0 Hive command line client.
hive-hbase 2.1.1-amzn-0 Hive-hbase client.
hive-metastore-server 2.1.1-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations.
hive-server2 2.1.1-amzn-0 Service for accepting Hive queries as web requests.
hue-server 3.12.0 Web application for analyzing data using Hadoop ecosystem applications
mahout-client 0.13.0 Library for machine learning.
mysql-server 5.5.54+ MySQL database server.
oozie-client 4.3.0 Oozie command-line client.
oozie-server 4.3.0 Service for accepting Oozie workflow requests.
phoenix-library 4.11.0-HBase-1.3 The phoenix libraries for server and client
phoenix-query-server 4.11.0-HBase-1.3 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API
presto-coordinator 0.170 Service for accepting queries and managing query execution among presto-workers.
presto-worker 0.170 Service for executing pieces of a query.
pig-client 0.16.0-amzn-0 Pig command-line client.
spark-client 2.1.1 Spark command-line clients.
spark-history-server 2.1.1 Web UI for viewing logged events for the lifetime of a completed Spark application.
spark-on-yarn 2.1.1 In-memory execution engine for YARN.
spark-yarn-slave 2.1.1 Apache Spark libraries needed by YARN slaves.
sqoop-client 1.4.6 Apache Sqoop command-line client.
tez-on-yarn 0.8.4 The tez YARN application and libraries.
webserver 2.4.25+ Apache HTTP server.
zeppelin-server 0.7.2 Web-based notebook that enables interactive data analytics.
zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
zookeeper-client 3.4.10 ZooKeeper command line client.

Release 5.7.0 Configuration Classifications

Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as hive-site.xml. For more information, see Configuring Applications.

emr-5.7.0 Classifications

Classifications Description

capacity-scheduler

Change values in Hadoop's capacity-scheduler.xml file.

core-site

Change values in Hadoop's core-site.xml file.

emrfs-site

Change EMRFS settings.

flink-conf

Change flink-conf.yaml settings.

flink-log4j

Change Flink log4j.properties settings.

flink-log4j-yarn-session

Change Flink log4j-yarn-session.properties settings.

flink-log4j-cli

Change Flink log4j-cli.properties settings.

hadoop-env

Change values in the Hadoop environment for all Hadoop components.

hadoop-log4j

Change values in Hadoop's log4j.properties file.