Amazon EMR 5.x Release Versions
Each tab below lists application versions, release notes, component versions, and configuration classifications available in each Amazon EMR 5.x release version.
For a comprehensive diagram of application versions in every release, see Application Versions in Amazon EMR 5.x Releases (PNG).
When you launch a cluster, you can choose from multiple release versions of Amazon
EMR. This allows you to test and use application versions that fit your compatibility
requirements. You specify the release version using the release label. Release labels are in the form emr-
x.x.x
. For example, emr-5.21.0
.
New Amazon EMR release versions are made available in different regions over a period of several days, beginning with the first region on the initial release date. The latest release version may not be available in your region during this period.
- 5.21.0
Amazon EMR Release 5.21.0
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.21.0. Changes are relative to 5.20.0.
Initial release date: February 18, 2019
Upgrades
-
Flink 1.7.0
-
Presto 0.215
-
AWS SDK for Java 1.11.479
Changes, Enhancements, and Resolved Issues
-
Zeppelin
-
Backported ZEPPELIN-3878.
-
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.2.1 Amazon SageMaker Spark SDK emr-ddb 4.7.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.5.1 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.11.0 Distributed copy application optimized for Amazon S3. emr-s3-select 1.2.0 EMR S3Select Connector emrfs 2.30.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.7.0 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.5-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.5-amzn-1 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.5-amzn-1 HDFS command-line client and library hadoop-hdfs-namenode 2.8.5-amzn-1 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.5-amzn-1 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.5-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.5-amzn-1 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.5-amzn-1 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.5-amzn-1 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.5-amzn-1 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.8 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.8 Service for serving one or more HBase regions. hbase-client 1.4.8 HBase command-line client. hbase-rest-server 1.4.8 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.8 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.4-amzn-0 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.4-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.4-amzn-0 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.4-amzn-0 Hive command line client. hive-hbase 2.3.4-amzn-0 Hive-hbase client. hive-metastore-server 2.3.4-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.4-amzn-0 Service for accepting Hive queries as web requests. hue-server 4.3.0 Web application for analyzing data using Hadoop ecosystem applications jupyterhub 0.9.4 Multi-user server for Jupyter notebooks livy-server 0.5.0-incubating REST interface for interacting with Apache Spark nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server mahout-client 0.13.0 Library for machine learning. mxnet 1.3.1 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit oozie-client 5.0.0 Oozie command-line client. oozie-server 5.0.0 Service for accepting Oozie workflow requests. opencv 3.4.0 Open Source Computer Vision Library. phoenix-library 4.14.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.14.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.215 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.215 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. r 3.4.1 The R Project for Statistical Computing spark-client 2.4.0 Spark command-line clients. spark-history-server 2.4.0 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.4.0 In-memory execution engine for YARN. spark-yarn-slave 2.4.0 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.7 Apache Sqoop command-line client. tensorflow 1.12.0 TensorFlow open source software library for high performance numerical computation. tez-on-yarn 0.9.1 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.8.0 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.13 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.13 ZooKeeper command line client. 5.21.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.21.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
container-log4j
Change values in Hadoop YARN's container-log4j.properties file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
jupyter-notebook-conf
Change values in Jupyter Notebook's jupyter_notebook_config.py file.
jupyter-hub-conf
Change values in JupyterHubs's jupyterhub_config.py file.
jupyter-s3-conf
Configure Jupyter Notebook S3 persistence.
jupyter-sparkmagic-conf
Change values in Sparkmagic's config.json file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-password-authenticator
Change values in Presto's password-authenticator.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-memory
Change values in Presto's memory.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
presto-connector-tpcds
Change values in Presto's tpcds.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.20.0
Amazon EMR Release 5.20.0
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.20.0. Changes are relative to 5.19.0.
Initial release date: December 18, 2018
Last updated date: January 22, 2019
Upgrades
-
Flink 1.6.2
-
HBase 1.4.8
-
Hive 2.3.4
-
Hue 4.3.0
-
MXNet 1.3.1
-
Presto 0.214
-
Spark 2.4.0
-
TensorFlow 1.12.0
-
Tez 0.9.1
-
AWS SDK for Java 1.11.461
New Features
-
(January 22, 2019) Kerberos in Amazon EMR has been improved to support authenticating principals from an external KDC. This centralizes principal management because multiple clusters can share a single, external KDC. In addition, the external KDC can have a cross-realm trust with an Active Directory domain. This allows all clusters to authenticate principals from Active Directory. For more information, see Use Kerberos Authentication in the Amazon EMR Management Guide.
Changes, Enhancements, and Resolved Issues
-
Default Amazon Linux AMI for Amazon EMR
-
Python3 package was upgraded from python 3.4 to 3.6.
-
-
The EMRFS S3-optimized committer
-
The EMRFS S3-optimized committer is now enabled by default, which improves write performance. For more information, see Using the EMRFS S3-optimized Committer.
-
-
Hive
-
Backported HIVE-16686.
-
-
Glue with Spark and Hive
-
In EMR 5.20.0 or later, parallel partition pruning is enabled automatically for Spark and Hive when AWS Glue Data Catalog is used as the metastore. This change significantly reduces query planning time by executing multiple requests in parallel to retrieve partitions. The total number of segments that can be executed concurrently range between 1 and 10. The default value is 5, which is a recommended setting. You can change it by specifying the property
aws.glue.partition.num.segments
inhive-site
configuration classification. If throttling occurs, you can turn off the feature by changing the value to 1. For more information, see AWS Glue Segment Structure.
-
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.2.1 Amazon SageMaker Spark SDK emr-ddb 4.7.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.5.1 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3. emr-s3-select 1.2.0 EMR S3Select Connector emrfs 2.29.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.6.2 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.5-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.5-amzn-1 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.5-amzn-1 HDFS command-line client and library hadoop-hdfs-namenode 2.8.5-amzn-1 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.5-amzn-1 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.5-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.5-amzn-1 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.5-amzn-1 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.5-amzn-1 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.5-amzn-1 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.8 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.8 Service for serving one or more HBase regions. hbase-client 1.4.8 HBase command-line client. hbase-rest-server 1.4.8 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.8 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.4-amzn-0 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.4-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.4-amzn-0 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.4-amzn-0 Hive command line client. hive-hbase 2.3.4-amzn-0 Hive-hbase client. hive-metastore-server 2.3.4-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.4-amzn-0 Service for accepting Hive queries as web requests. hue-server 4.3.0 Web application for analyzing data using Hadoop ecosystem applications jupyterhub 0.9.4 Multi-user server for Jupyter notebooks livy-server 0.5.0-incubating REST interface for interacting with Apache Spark nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server mahout-client 0.13.0 Library for machine learning. mxnet 1.3.1 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit oozie-client 5.0.0 Oozie command-line client. oozie-server 5.0.0 Service for accepting Oozie workflow requests. opencv 3.4.0 Open Source Computer Vision Library. phoenix-library 4.14.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.14.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.214 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.214 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. r 3.4.1 The R Project for Statistical Computing spark-client 2.4.0 Spark command-line clients. spark-history-server 2.4.0 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.4.0 In-memory execution engine for YARN. spark-yarn-slave 2.4.0 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.7 Apache Sqoop command-line client. tensorflow 1.12.0 TensorFlow open source software library for high performance numerical computation. tez-on-yarn 0.9.1 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.8.0 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.13 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.13 ZooKeeper command line client. 5.20.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.20.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
container-log4j
Change values in Hadoop YARN's container-log4j.properties file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
jupyter-notebook-conf
Change values in Jupyter Notebook's jupyter_notebook_config.py file.
jupyter-hub-conf
Change values in JupyterHubs's jupyterhub_config.py file.
jupyter-s3-conf
Configure Jupyter Notebook S3 persistence.
jupyter-sparkmagic-conf
Change values in Sparkmagic's config.json file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-password-authenticator
Change values in Presto's password-authenticator.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-memory
Change values in Presto's memory.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
presto-connector-tpcds
Change values in Presto's tpcds.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.19.0
Amazon EMR Release 5.19.0
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.19.0. Changes are relative to 5.18.0.
Initial release date: November 7, 2018
Last updated date: November 19, 2018
Upgrades
-
Hadoop 2.8.5
-
Flink 1.6.1
-
JupyterHub 0.9.4
-
MXNet 1.3.0
-
Presto 0.212
-
TensorFlow 1.11.0
-
Zookeeper 3.4.13
-
AWS SDK for Java 1.11.433
New Features
-
(Nov. 19, 2018) EMR Notebooks is a managed environment based on Jupyter Notebook. It supports Spark magic kernels for PySpark, Spark SQL, Spark R, and Scala. EMR Notebooks can be used with clusters created using Amazon EMR release version 5.18.0 and later. For more information, see Using EMR Notebooks in the Amazon EMR Management Guide.
-
The EMRFS S3-optimized committer is available when writing Parquet files using Spark and EMRFS. This committer improves write performance. For more information, see Using the EMRFS S3-optimized Committer.
Changes, Enhancements, and Resolved Issues
-
YARN
-
Modified the logic that limits the application master process to running on core nodes. This functionality now uses the YARN node labels feature and properties in the
yarn-site
andcapacity-scheduler
configuration classifications. For information, see https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html#emr-plan-spot-YARN.
-
-
Default Amazon Linux AMI for Amazon EMR
-
ruby18
,php56
, andgcc48
are no longer installed by default. These can be installed if desired usingyum
. -
The aws-java-sdk ruby gem is no longer installed by default. It can be installed using
gem install aws-java-sdk
, if desired. Specific components can also be installed. For example,gem install aws-java-sdk-s3
.
-
Known Issues
-
EMR Notebooks—In some circumstances, with multiple notebook editors open, the notebook editor may appear unable to connect to the cluster. If this happens, clear browser cookies and then reopen notebook editors.
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.2.0 Amazon SageMaker Spark SDK emr-ddb 4.7.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.5.1 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3. emr-s3-select 1.1.0 EMR S3Select Connector emrfs 2.28.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.6.1 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.5-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.5-amzn-0 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.5-amzn-0 HDFS command-line client and library hadoop-hdfs-namenode 2.8.5-amzn-0 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.5-amzn-0 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.5-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.5-amzn-0 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.5-amzn-0 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.5-amzn-0 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.5-amzn-0 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.7 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.7 Service for serving one or more HBase regions. hbase-client 1.4.7 HBase command-line client. hbase-rest-server 1.4.7 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.7 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.3-amzn-2 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.3-amzn-2 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.3-amzn-2 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.3-amzn-2 Hive command line client. hive-hbase 2.3.3-amzn-2 Hive-hbase client. hive-metastore-server 2.3.3-amzn-2 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.3-amzn-2 Service for accepting Hive queries as web requests. hue-server 4.2.0 Web application for analyzing data using Hadoop ecosystem applications jupyterhub 0.9.4 Multi-user server for Jupyter notebooks livy-server 0.5.0-incubating REST interface for interacting with Apache Spark nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server mahout-client 0.13.0 Library for machine learning. mxnet 1.3.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit oozie-client 5.0.0 Oozie command-line client. oozie-server 5.0.0 Service for accepting Oozie workflow requests. opencv 3.4.0 Open Source Computer Vision Library. phoenix-library 4.14.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.14.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.212 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.212 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. r 3.4.1 The R Project for Statistical Computing spark-client 2.3.2 Spark command-line clients. spark-history-server 2.3.2 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.3.2 In-memory execution engine for YARN. spark-yarn-slave 2.3.2 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.7 Apache Sqoop command-line client. tensorflow 1.11.0 TensorFlow open source software library for high performance numerical computation. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.8.0 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.13 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.13 ZooKeeper command line client. 5.19.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.19.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
container-log4j
Change values in Hadoop YARN's container-log4j.properties file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
jupyter-notebook-conf
Change values in Jupyter Notebook's jupyter_notebook_config.py file.
jupyter-hub-conf
Change values in JupyterHubs's jupyterhub_config.py file.
jupyter-s3-conf
Configure Jupyter Notebook S3 persistence.
jupyter-sparkmagic-conf
Change values in Sparkmagic's config.json file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-password-authenticator
Change values in Presto's password-authenticator.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-memory
Change values in Presto's memory.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
presto-connector-tpcds
Change values in Presto's tpcds.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.18.0
Amazon EMR Release 5.18.0
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.18.0. Changes are relative to 5.17.0.
Initial release date: October 24, 2018
Upgrades
-
Flink 1.6.0
-
HBase 1.4.7
-
Presto 0.210
-
Spark 2.3.2
-
Zeppelin 0.8.0
New Features
-
Beginning with Amazon EMR 5.18.0, you can use the Amazon EMR artifact repository to build your job code against the exact versions of libraries and dependencies that are available with specific Amazon EMR release versions. For more information, see Checking Dependencies Using the Amazon EMR Artifact Repository.
Changes, Enhancements, and Resolved Issues
-
Hive
-
Added support for S3 Select. For more information, see Using S3 Select with Hive to Improve Performance.
-
-
Presto
-
Added support for S3 Select Pushdown. For more information, see Using S3 Select Pushdown with Presto to Improve Performance.
-
-
Spark
-
The default log4j configuration for Spark has been changed to roll container logs hourly for Spark streaming jobs. This helps prevent the deletion of logs for long-running Spark streaming jobs.
-
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.1.3 Amazon SageMaker Spark SDK emr-ddb 4.6.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.5.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3. emr-s3-select 1.1.0 EMR S3Select Connector emrfs 2.27.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.6.0 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.4-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.4-amzn-1 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.4-amzn-1 HDFS command-line client and library hadoop-hdfs-namenode 2.8.4-amzn-1 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.4-amzn-1 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.4-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.4-amzn-1 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.4-amzn-1 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.4-amzn-1 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.4-amzn-1 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.7 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.7 Service for serving one or more HBase regions. hbase-client 1.4.7 HBase command-line client. hbase-rest-server 1.4.7 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.7 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.3-amzn-2 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.3-amzn-2 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.3-amzn-2 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.3-amzn-2 Hive command line client. hive-hbase 2.3.3-amzn-2 Hive-hbase client. hive-metastore-server 2.3.3-amzn-2 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.3-amzn-2 Service for accepting Hive queries as web requests. hue-server 4.2.0 Web application for analyzing data using Hadoop ecosystem applications jupyterhub 0.8.1 Multi-user server for Jupyter notebooks livy-server 0.5.0-incubating REST interface for interacting with Apache Spark nginx 1.12.1 nginx [engine x] is an HTTP and reverse proxy server mahout-client 0.13.0 Library for machine learning. mxnet 1.2.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit oozie-client 5.0.0 Oozie command-line client. oozie-server 5.0.0 Service for accepting Oozie workflow requests. opencv 3.4.0 Open Source Computer Vision Library. phoenix-library 4.14.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.14.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.210 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.210 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. r 3.4.1 The R Project for Statistical Computing spark-client 2.3.2 Spark command-line clients. spark-history-server 2.3.2 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.3.2 In-memory execution engine for YARN. spark-yarn-slave 2.3.2 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.7 Apache Sqoop command-line client. tensorflow 1.9.0 TensorFlow open source software library for high performance numerical computation. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.8.0 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.12 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.12 ZooKeeper command line client. 5.18.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.18.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
container-log4j
Change values in Hadoop YARN's container-log4j.properties file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
jupyter-notebook-conf
Change values in Jupyter Notebook's jupyter_notebook_config.py file.
jupyter-hub-conf
Change values in JupyterHubs's jupyterhub_config.py file.
jupyter-s3-conf
Configure Jupyter Notebook S3 persistence.
jupyter-sparkmagic-conf
Change values in Sparkmagic's config.json file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-password-authenticator
Change values in Presto's password-authenticator.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.17.0
Amazon EMR Release 5.17.0
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.17.0. Changes are relative to 5.16.0.
Initial release date: August 30, 2018
Upgrades
-
Flink 1.5.2
-
HBase 1.4.6
-
Presto 0.206
New Features
-
Added support for Tensorflow. For more information, see TensorFlow.
Changes, Enhancements, and Resolved Issues
-
JupyterHub
-
Added support for notebook persistence in Amazon S3. For more information, see Configuring Persistence for Notebooks in Amazon S3.
-
-
Spark
-
Added support for S3 Select. For more information, see Using S3 Select with Spark to Improve Query Performance.
-
Known Issues
-
When you create a kerberized cluster with Livy installed, Livy fails with an error that simple authentication is not enabled. Rebooting the Livy server resolves the issue. As a workaround, add a step during cluster creation that runs
sudo restart livy-server
on the master node. -
If you use a custom Amazon Linux AMI based on an Amazon Linux AMI with a creation date of 2018-08-11, the Oozie server fails to start. If you use Oozie, create a custom AMI based on an Amazon Linux AMI ID with a different creation date. You can use the following AWS CLI command to return a list of Image IDs for all HVM Amazon Linux AMIs with a 2018.03 version, along with the release date, so that you can choose an appropriate Amazon Linux AMI as your base. Replace MyRegion with your region identifier, such as us-west-2.
aws ec2 --region
MyRegion
describe-images --owner amazon --query 'Images[?Name!=`null`]|[?starts_with(Name, `amzn-ami-hvm-2018.03`) == `true`].[CreationDate,ImageId,Name]' --output text | sort -rk1
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.1.3 Amazon SageMaker Spark SDK emr-ddb 4.6.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.5.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3. emr-s3-select 1.0.0 EMR S3Select Connector emrfs 2.26.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.5.2 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.4-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.4-amzn-1 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.4-amzn-1 HDFS command-line client and library hadoop-hdfs-namenode 2.8.4-amzn-1 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.4-amzn-1 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.4-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.4-amzn-1 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.4-amzn-1 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.4-amzn-1 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.4-amzn-1 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.6 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.6 Service for serving one or more HBase regions. hbase-client 1.4.6 HBase command-line client. hbase-rest-server 1.4.6 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.6 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.3-amzn-1 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.3-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.3-amzn-1 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.3-amzn-1 Hive command line client. hive-hbase 2.3.3-amzn-1 Hive-hbase client. hive-metastore-server 2.3.3-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.3-amzn-1 Service for accepting Hive queries as web requests. hue-server 4.2.0 Web application for analyzing data using Hadoop ecosystem applications jupyterhub 0.8.1 Multi-user server for Jupyter notebooks livy-server 0.5.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 1.2.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit oozie-client 5.0.0 Oozie command-line client. oozie-server 5.0.0 Service for accepting Oozie workflow requests. opencv 3.4.0 Open Source Computer Vision Library. phoenix-library 4.14.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.14.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.206 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.206 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. r 3.4.1 The R Project for Statistical Computing spark-client 2.3.1 Spark command-line clients. spark-history-server 2.3.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.3.1 In-memory execution engine for YARN. spark-yarn-slave 2.3.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.7 Apache Sqoop command-line client. tensorflow 1.9.0 TensorFlow open source software library for high performance numerical computation. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.12 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.12 ZooKeeper command line client. 5.17.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.17.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
container-log4j
Change values in Hadoop YARN's container-log4j.properties file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
jupyter-notebook-conf
Change values in Jupyter Notebook's jupyter_notebook_config.py file.
jupyter-hub-conf
Change values in JupyterHubs's jupyterhub_config.py file.
jupyter-s3-conf
Configure Jupyter Notebook S3 persistence.
jupyter-sparkmagic-conf
Change values in Sparkmagic's config.json file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-password-authenticator
Change values in Presto's password-authenticator.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.16.0
Amazon EMR Release 5.16.0
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.16.0. Changes are relative to 5.15.0.
Initial release date: July 19, 2018
Upgrades
-
Hadoop 2.8.4
-
Flink 1.5.0
-
Livy 0.5.0
-
MXNet 1.2.0
-
Phoenix 4.14.0
-
Presto 0.203
-
Spark 2.3.1
-
AWS SDK for Java 1.11.336
-
CUDA 9.2
-
Redshift JDBC Driver 1.2.15.1025
Changes, Enhancements, and Resolved Issues
-
HBase
-
Backported HBASE-20723
-
-
Presto
-
Configuration changes to support LDAP authentication. For more information, see Using LDAP Authentication for Presto on Amazon EMR.
-
-
Spark
-
Apache Spark version 2.3.1, available beginning with Amazon EMR release version 5.16.0, addresses CVE-2018-8024 and CVE-2018-1334. We recommend that you migrate earlier versions of Spark to Spark version 2.3.1 or later.
-
Known Issues
-
This release version does not support the c1.medium or m1.small instance types. Clusters using either of these instance types fail to start. As a workaround, specify a different instance type or use a different release version.
-
When you create a kerberized cluster with Livy installed, Livy fails with an error that simple authentication is not enabled. Rebooting the Livy server resolves the issue. As a workaround, add a step during cluster creation that runs
sudo restart livy-server
on the master node.
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.1.0 Amazon SageMaker Spark SDK emr-ddb 4.6.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3. emrfs 2.25.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.5.0 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.4-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.4-amzn-0 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.4-amzn-0 HDFS command-line client and library hadoop-hdfs-namenode 2.8.4-amzn-0 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.4-amzn-0 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.4-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.4-amzn-0 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.4-amzn-0 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.4-amzn-0 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.4-amzn-0 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.4 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.4 Service for serving one or more HBase regions. hbase-client 1.4.4 HBase command-line client. hbase-rest-server 1.4.4 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.4 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.3-amzn-1 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.3-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.3-amzn-1 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.3-amzn-1 Hive command line client. hive-hbase 2.3.3-amzn-1 Hive-hbase client. hive-metastore-server 2.3.3-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.3-amzn-1 Service for accepting Hive queries as web requests. hue-server 4.2.0 Web application for analyzing data using Hadoop ecosystem applications jupyterhub 0.8.1 Multi-user server for Jupyter notebooks livy-server 0.5.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 1.2.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.2.88 Nvidia drivers and Cuda toolkit oozie-client 5.0.0 Oozie command-line client. oozie-server 5.0.0 Service for accepting Oozie workflow requests. opencv 3.4.0 Open Source Computer Vision Library. phoenix-library 4.14.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.14.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.203 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.203 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. r 3.4.1 The R Project for Statistical Computing spark-client 2.3.1 Spark command-line clients. spark-history-server 2.3.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.3.1 In-memory execution engine for YARN. spark-yarn-slave 2.3.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.7 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.12 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.12 ZooKeeper command line client. 5.16.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.16.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
container-log4j
Change values in Hadoop YARN's container-log4j.properties file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
jupyter-notebook-conf
Change values in Jupyter Notebook's jupyter_notebook_config.py file.
jupyter-hub-conf
Change values in JupyterHubs's jupyterhub_config.py file.
jupyter-sparkmagic-conf
Change values in Sparkmagic's config.json file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-password-authenticator
Change values in Presto's password-authenticator.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.15.0
Amazon EMR Release 5.15.0
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.15.0. Changes are relative to 5.14.0.
Initial release date: June 21, 2018
Upgrades
-
Upgraded HBase to 1.4.4
-
Upgraded Hive to 2.3.3
-
Upgraded Hue to 4.2.0
-
Upgraded Oozie to 5.0.0
-
Upgraded Zookeeper to 3.4.12
-
Upgraded AWS SDK to 1.11.333
Changes, Enhancements, and Resolved Issues
-
Hive
-
Backported HIVE-18069
-
-
Hue
-
Updated Hue to correctly authenticate with Livy when Kerberos is enabled. Livy is now supported when using Kerberos with Amazon EMR.
-
-
JupyterHub
-
Updated JupyterHub so that Amazon EMR installs LDAP client libraries by default.
-
Fixed an error in the script that generates self-signed certificates. For more information about the issue, see Release Notes
-
Known Issues
-
This release version does not support the c1.medium or m1.small instance types. Clusters using either of these instance types fail to start. As a workaround, specify a different instance type or use a different release version.
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3. emrfs 2.24.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.4.2 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.3-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.3-amzn-1 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.3-amzn-1 HDFS command-line client and library hadoop-hdfs-namenode 2.8.3-amzn-1 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.3-amzn-1 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.3-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.3-amzn-1 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.3-amzn-1 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.3-amzn-1 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.3-amzn-1 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.4 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.4 Service for serving one or more HBase regions. hbase-client 1.4.4 HBase command-line client. hbase-rest-server 1.4.4 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.4 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.3-amzn-0 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.3-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.3-amzn-0 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.3-amzn-0 Hive command line client. hive-hbase 2.3.3-amzn-0 Hive-hbase client. hive-metastore-server 2.3.3-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.3-amzn-0 Service for accepting Hive queries as web requests. hue-server 4.2.0 Web application for analyzing data using Hadoop ecosystem applications jupyterhub 0.8.1 Multi-user server for Jupyter notebooks livy-server 0.4.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 1.1.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.1.85 Nvidia drivers and Cuda toolkit oozie-client 5.0.0 Oozie command-line client. oozie-server 5.0.0 Service for accepting Oozie workflow requests. opencv 3.4.0 Open Source Computer Vision Library. phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.194 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.194 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. r 3.4.1 The R Project for Statistical Computing spark-client 2.3.0 Spark command-line clients. spark-history-server 2.3.0 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.3.0 In-memory execution engine for YARN. spark-yarn-slave 2.3.0 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.7 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.12 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.12 ZooKeeper command line client. 5.15.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.15.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
container-log4j
Change values in Hadoop YARN's container-log4j.properties file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
jupyter-notebook-conf
Change values in Jupyter Notebook's jupyter_notebook_config.py file.
jupyter-hub-conf
Change values in JupyterHubs's jupyterhub_config.py file.
jupyter-sparkmagic-conf
Change values in Sparkmagic's config.json file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.14.0
Amazon EMR Release 5.14.0
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.14.0. Changes are relative to 5.13.0.
Initial release date: June 4, 2018
Upgrades
-
Upgraded Apache Flink to 1.4.2
-
Upgraded Apache MXnet to 1.1.0
-
Upgraded Apache Sqoop to 1.4.7
New Features
-
Added JupyterHub support. For more information, see JupyterHub.
Changes, Enhancements, and Resolved Issues
-
EMRFS
-
The userAgent string in requests to Amazon S3 has been updated to contain the user and group information of the invoking principal. This can be used with AWS CloudTrail logs for more comprehensive request tracking.
-
-
HBase
-
Included HBASE-20447, which addresses an issue that could cause cache issues, especially with split regions.
-
-
MXnet
-
Added OpenCV libraries.
-
-
Spark
-
When Spark writes Parquet files to an Amazon S3 location using EMRFS, the FileOutputCommitter algorithm has been updated to use version 2 instead of version 1. This reduces the number of renames, which improves application performance. This change does not affect:
-
Applications other than Spark.
-
Applications that write to other file systems, such as HDFS (which still use version 1 of FileOutputCommitter).
-
Applications that use other output formats, such as text or csv, that already use EMRFS direct write.
-
-
Known Issues
-
JupyterHub
-
Using configuration classifications to set up JupyterHub and individual Jupyter notebooks when you create a cluster is not supported. Edit the jupyterhub_config.py file and jupyter_notebook_config.py files for each user manually. For more information, see Configuring JupyterHub.
-
JupyterHub fails to start on clusters within a private subnet, failing with the message
Error: ENOENT: no such file or directory, open '/etc/jupyter/conf/server.crt'
. This is caused by an error in the script that generates self-signed certificates. Use the following workaround to generate self-signed certificates. All commands are executed while connected to the master node.-
Copy the certificate generation script from the container to the master node:
sudo docker cp jupyterhub:/tmp/gen_self_signed_cert.sh ./
-
Use a text editor to change line 23 to change public hostname to local hostname as shown below:
local
hostname=$(curl -s $EC2_METADATA_SERVICE_URI/local
-hostname) -
Run the script to generate self-signed certificates:
sudo bash ./gen_self_signed_cert.sh
-
Move the certificate files that the script generates to the
/etc/jupyter/conf/
directory:sudo mv /tmp/server.crt /tmp/server.key /etc/jupyter/conf/
You can
tail
thejupyter.log
file to verify that JupyterHub restarted and is returning a 200 response code. For example:tail -f /var/log/jupyter/jupyter.log
This should return a response similar to the following:
# [I 2018-06-14 18:56:51.356 JupyterHub app:1581] JupyterHub is now running at https://:9443/ # 19:01:51.359 - info: [ConfigProxy] 200 GET /api/routes
-
-
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3. emrfs 2.23.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.4.2 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.3-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.3-amzn-1 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.3-amzn-1 HDFS command-line client and library hadoop-hdfs-namenode 2.8.3-amzn-1 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.3-amzn-1 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.3-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.3-amzn-1 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.3-amzn-1 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.3-amzn-1 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.3-amzn-1 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.2 Service for serving one or more HBase regions. hbase-client 1.4.2 HBase command-line client. hbase-rest-server 1.4.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.2 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.2-amzn-2 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.2-amzn-2 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.2-amzn-2 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.2-amzn-2 Hive command line client. hive-hbase 2.3.2-amzn-2 Hive-hbase client. hive-metastore-server 2.3.2-amzn-2 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.2-amzn-2 Service for accepting Hive queries as web requests. hue-server 4.1.0 Web application for analyzing data using Hadoop ecosystem applications jupyterhub 0.8.1 Multi-user server for Jupyter notebooks livy-server 0.4.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 1.1.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.1.85 Nvidia drivers and Cuda toolkit oozie-client 4.3.0 Oozie command-line client. oozie-server 4.3.0 Service for accepting Oozie workflow requests. opencv 3.4.0 Open Source Computer Vision Library. phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.194 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.194 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. r 3.4.1 The R Project for Statistical Computing spark-client 2.3.0 Spark command-line clients. spark-history-server 2.3.0 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.3.0 In-memory execution engine for YARN. spark-yarn-slave 2.3.0 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.7 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.10 ZooKeeper command line client. 5.14.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.14.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
container-log4j
Change values in Hadoop YARN's container-log4j.properties file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
jupyter-notebook-conf
Change values in Jupyter Notebook's jupyter_notebook_config.py file.
jupyter-hub-conf
Change values in JupyterHubs's jupyterhub_config.py file.
jupyter-sparkmagic-conf
Change values in Sparkmagic's config.json file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.13.0
Amazon EMR Release 5.13.0
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR release version 5.13.0. Changes are relative to 5.12.0.
Upgrades
-
Upgraded Spark to 2.3.0
-
Upgraded HBase to 1.4.2
-
Upgraded Presto to 0.194
-
Upgraded AWS Java SDK to 1.11.297
Changes, Enhancements, and Resolved Issues
-
Hive
-
Backported HIVE-15436. Enhanced Hive APIs to return only views.
-
Known Issues
-
MXNet does not currently have OpenCV libraries.
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.10.0 Distributed copy application optimized for Amazon S3. emrfs 2.22.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.4.0 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.3-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.3-amzn-0 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.3-amzn-0 HDFS command-line client and library hadoop-hdfs-namenode 2.8.3-amzn-0 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.3-amzn-0 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.3-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.3-amzn-0 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.3-amzn-0 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.3-amzn-0 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.3-amzn-0 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.2 Service for serving one or more HBase regions. hbase-client 1.4.2 HBase command-line client. hbase-rest-server 1.4.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.2 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.2-amzn-2 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.2-amzn-2 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.2-amzn-2 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.2-amzn-2 Hive command line client. hive-hbase 2.3.2-amzn-2 Hive-hbase client. hive-metastore-server 2.3.2-amzn-2 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.2-amzn-2 Service for accepting Hive queries as web requests. hue-server 4.1.0 Web application for analyzing data using Hadoop ecosystem applications livy-server 0.4.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 1.0.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.1.85 Nvidia drivers and Cuda toolkit oozie-client 4.3.0 Oozie command-line client. oozie-server 4.3.0 Service for accepting Oozie workflow requests. phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.194 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.194 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. r 3.4.1 The R Project for Statistical Computing spark-client 2.3.0 Spark command-line clients. spark-history-server 2.3.0 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.3.0 In-memory execution engine for YARN. spark-yarn-slave 2.3.0 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.10 ZooKeeper command line client. 5.13.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.13.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.12.x
There are multiple releases within the 5.12 series. Choose a link below to see information for a specific release within this tab.
Release 5.12.2 Application Versions
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.12.2. Changes are relative to 5.12.1.
Initial release date: August 29, 2018
Changes, Enhancements, and Resolved Issues
-
This release addresses a potential security vulnerability.
Release 5.12.2 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.9.0 Distributed copy application optimized for Amazon S3. emrfs 2.21.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.4.0 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.3-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.3-amzn-0 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.3-amzn-0 HDFS command-line client and library hadoop-hdfs-namenode 2.8.3-amzn-0 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.3-amzn-0 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.3-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.3-amzn-0 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.3-amzn-0 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.3-amzn-0 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.3-amzn-0 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.0 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.0 Service for serving one or more HBase regions. hbase-client 1.4.0 HBase command-line client. hbase-rest-server 1.4.0 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.0 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.2-amzn-1 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.2-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.2-amzn-1 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.2-amzn-1 Hive command line client. hive-hbase 2.3.2-amzn-1 Hive-hbase client. hive-metastore-server 2.3.2-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.2-amzn-1 Service for accepting Hive queries as web requests. hue-server 4.1.0 Web application for analyzing data using Hadoop ecosystem applications livy-server 0.4.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 1.0.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.1.85 Nvidia drivers and Cuda toolkit oozie-client 4.3.0 Oozie command-line client. oozie-server 4.3.0 Service for accepting Oozie workflow requests. phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.188 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.188 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. spark-client 2.2.1 Spark command-line clients. spark-history-server 2.2.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.2.1 In-memory execution engine for YARN. spark-yarn-slave 2.2.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.10 ZooKeeper command line client. Release 5.12.2 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.12.2 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 5.12.1 Application Versions
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.12.1. Changes are relative to 5.12.0.
Initial release date: March 29, 2018
Changes, Enhancements, and Resolved Issues
-
Updated the Amazon Linux kernel of the default Amazon Linux AMI for Amazon EMR to address potential vulnerabilities.
Release 5.12.1 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.9.0 Distributed copy application optimized for Amazon S3. emrfs 2.21.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.4.0 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.3-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.3-amzn-0 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.3-amzn-0 HDFS command-line client and library hadoop-hdfs-namenode 2.8.3-amzn-0 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.3-amzn-0 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.3-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.3-amzn-0 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.3-amzn-0 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.3-amzn-0 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.3-amzn-0 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.0 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.0 Service for serving one or more HBase regions. hbase-client 1.4.0 HBase command-line client. hbase-rest-server 1.4.0 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.0 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.2-amzn-1 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.2-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.2-amzn-1 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.2-amzn-1 Hive command line client. hive-hbase 2.3.2-amzn-1 Hive-hbase client. hive-metastore-server 2.3.2-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.2-amzn-1 Service for accepting Hive queries as web requests. hue-server 4.1.0 Web application for analyzing data using Hadoop ecosystem applications livy-server 0.4.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 1.0.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.1.85 Nvidia drivers and Cuda toolkit oozie-client 4.3.0 Oozie command-line client. oozie-server 4.3.0 Service for accepting Oozie workflow requests. phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.188 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.188 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. spark-client 2.2.1 Spark command-line clients. spark-history-server 2.2.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.2.1 In-memory execution engine for YARN. spark-yarn-slave 2.2.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.10 ZooKeeper command line client. Release 5.12.1 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.12.1 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 5.12.0 Application Versions
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR release version 5.12.0. Changes are relative to 5.11.1.
Upgrades
-
AWS SDK for Java 1.11.238 ⇒ 1.11.267. For more information, see the AWS SDK for Java Change Log on GitHub.
-
Hadoop 2.7.3 ⇒ 2.8.3. For more information, see Apache Hadoop Releases.
-
Flink 1.3.2 ⇒ 1.4.0. For more information, see the Apache Flink 1.4.0 Release Announcement.
-
HBase 1.3.1 ⇒ 1.4.0. For more information, see the HBase Release Announcement.
-
Hue 4.0.1 ⇒ 4.1.0. For more information, see the Release Notes.
-
MxNet 0.12.0 ⇒ 1.0.0. For more information, see the MXNet Change Log on GitHub.
-
Presto 0.187 ⇒ 0.188. For more information, see the Release Notes.
Changes, Enhancements, and Resolved Issues
-
Hadoop
-
The
yarn.resourcemanager.decommissioning.timeout
property has changed toyarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs
. You can use this property to customize cluster scale-down. For more information, see Cluster Scale-Down in the Amazon EMR Management Guide. -
The Hadoop CLI added the
-d
option to thecp
(copy) command, which specifies direct copy. You can use this to avoid creating an intermediary.COPYING
file, which makes copying data between Amazon S3 faster. For more information, see HADOOP-12384.
-
-
Pig
-
Added the
pig-env
configuration classification, which simplifies the configuration of Pig environment properties. For more information, see Configuring Applications.
-
-
Presto
-
Added the
presto-connector-redshift
configuration classification, which you can use to configure values in the Prestoredshift.properties
configuration file. For more information, see Redshift Connector in Presto documentation, and Configuring Applications. -
Presto support for EMRFS has been added and is the default configuration. Earlier Amazon EMR release versions used PrestoS3FileSystem, which was the only option. For more information, see EMRFS and PrestoS3FileSystem Configuration.
Note
A configuration issue can cause Presto errors when querying underlying data in Amazon S3 with Amazon EMR release version 5.12.0. This is because Presto fails to pick up configuration classification values from
emrfs-site.xml
. As a workaround, create anemrfs
subdirectory underusr/lib/presto/plugin/hive-hadoop2/
, create a symlink inusr/lib/presto/plugin/hive-hadoop2/emrfs
to the existing/usr/share/aws/emr/emrfs/conf/emrfs-site.xml
file, and then restart the presto-server process (sudo presto-server stop
followed bysudo presto-server start
).
-
-
Spark
Known Issues
-
MXNet does not include OpenCV libraries.
-
SparkR is not available for clusters created using a custom AMI because R is not installed by default on cluster nodes.
Release 5.12.0 Component Versions
Component Version Description aws-sagemaker-spark-sdk 1.0.1 Amazon SageMaker Spark SDK emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.9.0 Distributed copy application optimized for Amazon S3. emrfs 2.21.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.4.0 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.8.3-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.8.3-amzn-0 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.8.3-amzn-0 HDFS command-line client and library hadoop-hdfs-namenode 2.8.3-amzn-0 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.8.3-amzn-0 HTTP endpoint for HDFS operations. hadoop-kms-server 2.8.3-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.8.3-amzn-0 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.8.3-amzn-0 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.8.3-amzn-0 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.8.3-amzn-0 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.4.0 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.4.0 Service for serving one or more HBase regions. hbase-client 1.4.0 HBase command-line client. hbase-rest-server 1.4.0 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.4.0 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.2-amzn-1 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.2-amzn-1 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.2-amzn-1 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.2-amzn-1 Hive command line client. hive-hbase 2.3.2-amzn-1 Hive-hbase client. hive-metastore-server 2.3.2-amzn-1 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.2-amzn-1 Service for accepting Hive queries as web requests. hue-server 4.1.0 Web application for analyzing data using Hadoop ecosystem applications livy-server 0.4.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 1.0.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.1.85 Nvidia drivers and Cuda toolkit oozie-client 4.3.0 Oozie command-line client. oozie-server 4.3.0 Service for accepting Oozie workflow requests. phoenix-library 4.13.0-HBase-1.4 The phoenix libraries for server and client phoenix-query-server 4.13.0-HBase-1.4 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.188 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.188 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. spark-client 2.2.1 Spark command-line clients. spark-history-server 2.2.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.2.1 In-memory execution engine for YARN. spark-yarn-slave 2.2.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.10 ZooKeeper command line client. Release 5.12.0 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.12.0 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-env
Change values in the Pig environment.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-redshift
Change values in Presto's redshift.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 5.11.x
-
There are multiple releases within the 5.11 series. Choose a link below to see information for a specific release within this tab.
Release 5.11.2 Application Versions
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 5.11.2. Changes are relative to 5.11.1.
Initial release date: August 29, 2018
Changes, Enhancements, and Resolved Issues
-
This release addresses a potential security vulnerability.
Release 5.11.2 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.0 Amazon SageMaker Spark SDK emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.8.0 Distributed copy application optimized for Amazon S3. emrfs 2.20.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.3.2 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-6 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-6 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-6 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-6 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-6 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-6 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-6 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-6 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-6 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-6 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.3.1 Service for serving one or more HBase regions. hbase-client 1.3.1 HBase command-line client. hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.2-amzn-0 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.2-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.2-amzn-0 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.2-amzn-0 Hive command line client. hive-hbase 2.3.2-amzn-0 Hive-hbase client. hive-metastore-server 2.3.2-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.2-amzn-0 Service for accepting Hive queries as web requests. hue-server 4.0.1 Web application for analyzing data using Hadoop ecosystem applications livy-server 0.4.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 0.12.0 A flexible, scalable, and efficient library for deep learning. mysql-server 5.5.54+ MySQL database server. nvidia-cuda 9.0.176 Nvidia drivers and Cuda toolkit oozie-client 4.3.0 Oozie command-line client. oozie-server 4.3.0 Service for accepting Oozie workflow requests. phoenix-library 4.11.0-HBase-1.3 The phoenix libraries for server and client phoenix-query-server 4.11.0-HBase-1.3 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.187 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.187 Service for executing pieces of a query. pig-client 0.17.0 Pig command-line client. spark-client 2.2.1 Spark command-line clients. spark-history-server 2.2.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 2.2.1 In-memory execution engine for YARN. spark-yarn-slave 2.2.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.7.3 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.10 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.10 ZooKeeper command line client. Release 5.11.2 Configuration Classifications
Configuration classifications allow you to customize applications when you create a cluster. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-5.11.2 Classifications
Classifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
flink-conf
Change flink-conf.yaml settings.
flink-log4j
Change Flink log4j.properties settings.
flink-log4j-yarn-session
Change Flink log4j-yarn-session.properties settings.
flink-log4j-cli
Change Flink log4j-cli.properties settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase
Amazon EMR-curated settings for Apache HBase.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j2
Change values in HCatalog WebHCat's log4j2.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-beeline-log4j2
Change values in Hive's beeline-log4j2.properties file.
hive-parquet-logging
Change values in Hive's parquet-logging.properties file.
hive-env
Change values in the Hive environment.
hive-exec-log4j2
Change values in Hive's hive-exec-log4j2.properties file.
hive-llap-daemon-log4j2
Change values in Hive's llap-daemon-log4j2.properties file.
hive-log4j2
Change values in Hive's hive-log4j2.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
livy-conf
Change values in Livy's livy.conf file.
livy-env
Change values in the Livy environment.
livy-log4j
Change Livy log4j.properties settings.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-env
Change values in Presto's presto-env.sh file.
presto-node
Change values in Presto's node.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 5.11.1 Application Versions
The following applications are supported in this release: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, Tez, Zeppelin, and ZooKeeper.
The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 5.11.1 release. Changes are relative to the Amazon EMR 5.8.0 release.
Initial release date: January 22, 2018
Changes, Enhancements, and Resolved Issues
-
Updated the Amazon Linux kernel of the default Amazon Linux AMI for Amazon EMR to address vulnerabilities associated with speculative execution (CVE-2017-5715, CVE-2017-5753, and CVE-2017-5754). For more information, see https://aws.amazon.com/security/security-bulletins/AWS-2018-013/.
Release 5.11.1 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components need changes from community versions for Amazon EMR. These components have a version label in the form
. For example, if a big-data community component namedCommunityVersion
-amzn-EmrVersion
myapp-component
of version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-3
.Component Version Description aws-sagemaker-spark-sdk 1.0 Amazon SageMaker Spark SDK emr-ddb 4.5.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.4.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.4.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.8.0 Distributed copy application optimized for Amazon S3. emrfs 2.20.0 Amazon S3 connector for Hadoop ecosystem applications. flink-client 1.3.2 Apache Flink command line client scripts and applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-6 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-6 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-6 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-6 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-6 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-6 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-6 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-6 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-6 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-6 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.3.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.3.1 Service for serving one or more HBase regions. hbase-client 1.3.1 HBase command-line client. hbase-rest-server 1.3.1 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.3.1 Service providing a Thrift endpoint to HBase. hcatalog-client 2.3.2-amzn-0 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 2.3.2-amzn-0 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 2.3.2-amzn-0 HTTP endpoint providing a REST interface to HCatalog. hive-client 2.3.2-amzn-0 Hive command line client. hive-hbase 2.3.2-amzn-0 Hive-hbase client. hive-metastore-server 2.3.2-amzn-0 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server2 2.3.2-amzn-0 Service for accepting Hive queries as web requests. hue-server 4.0.1 Web application for analyzing data using Hadoop ecosystem applications livy-server 0.4.0-incubating REST interface for interacting with Apache Spark mahout-client 0.13.0 Library for machine learning. mxnet 0.12. -