Details for Amazon EMR 4.x Release Versions
Each tab below lists application versions, release notes, component versions, and configuration classifications available in each Amazon EMR 4.x release version.
For a comprehensive diagram of application versions in every release, see Application Versions for 4.x Series Amazon EMR Releases (PNG).
For application-specific differences between Amazon EMR 4.x release versions and versions beginning with Amazon EMR 5.0.0, see Differences in Amazon EMR 4.x Release Versions.
- 4.9.x
-
There are multiple releases within the 4.9 series. Choose a link below to see information for a specific release within this tab.
4.9.6 |4.9.5 | 4.9.4 | 4.9.3 | 4.9.2 | 4.9.1
Important In this release version, Amazon EMR uses AWS Signature Version 4 exclusively to authenticate requests to Amazon S3. For more information, see Whats New.
Release 4.9.5 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 4.9.5. Changes are relative to 4.9.4.
Initial release date: August 29, 2018
Changes, Enhancements, and Resolved Issues
-
HBase
-
This release addresses a potential security vulnerability.
-
Release 4.9.5 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 4.3.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.2.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.3.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.17.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-2 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-2 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-2 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-2 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-2 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-2 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-2 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-2 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-2 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-2 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.2 Service for serving one or more HBase regions. hbase-client 1.2.2 HBase command-line client. hbase-rest-server 1.2.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.2 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-9 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-9 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-9 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-9 Hive command line client. hive-metastore-server 1.0.0-amzn-9 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-9 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.54+ MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.157.1 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.157.1 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.3 Spark command-line clients. spark-history-server 1.6.3 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.3 In-memory execution engine for YARN. spark-yarn-slave 1.6.3 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.6.1 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.9 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.9 ZooKeeper command line client. Release 4.9.5 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.9.5 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 4.9.4 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR release version 4.9.4. Changes are relative to 4.9.3.
Initial release date: March 29, 2018
Changes, Enhancements, and Resolved Issues
-
Updated the Amazon Linux kernel of the default Amazon Linux AMI for Amazon EMR to address potential vulnerabilities.
Release 4.9.4 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 4.3.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.2.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.3.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.17.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-2 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-2 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-2 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-2 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-2 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-2 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-2 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-2 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-2 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-2 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.2 Service for serving one or more HBase regions. hbase-client 1.2.2 HBase command-line client. hbase-rest-server 1.2.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.2 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-9 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-9 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-9 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-9 Hive command line client. hive-metastore-server 1.0.0-amzn-9 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-9 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.54+ MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.157.1 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.157.1 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.3 Spark command-line clients. spark-history-server 1.6.3 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.3 In-memory execution engine for YARN. spark-yarn-slave 1.6.3 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.6.1 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.9 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.9 ZooKeeper command line client. Release 4.9.4 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.9.4 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 4.9.3 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 4.9.3 release. Changes are relative to the Amazon EMR 4.9.2 release.
Initial release date: January 22, 2018
Changes, Enhancements, and Resolved Issues
-
Updated the Amazon Linux kernel of the default Amazon Linux AMI for Amazon EMR to address vulnerabilities associated with speculative execution (CVE-2017-5715, CVE-2017-5753, and CVE-2017-5754). For more information, see https://aws.amazon.com/security/security-bulletins/AWS-2018-013/
.
Release 4.9.3 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 4.3.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.2.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.3.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.17.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-2 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-2 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-2 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-2 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-2 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-2 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-2 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-2 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-2 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-2 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.2 Service for serving one or more HBase regions. hbase-client 1.2.2 HBase command-line client. hbase-rest-server 1.2.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.2 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-9 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-9 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-9 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-9 Hive command line client. hive-metastore-server 1.0.0-amzn-9 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-9 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.54+ MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.157.1 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.157.1 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.3 Spark command-line clients. spark-history-server 1.6.3 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.3 In-memory execution engine for YARN. spark-yarn-slave 1.6.3 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.6.1 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.9 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.9 ZooKeeper command line client. Release 4.9.3 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.9.3 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 4.9.2 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 4.9.2 release. Changes are relative to the Amazon EMR 4.9.1 release.
Release date: July 13, 2017
Minor changes, bug fixes, and enhancements were made in this release.
Release 4.9.2 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 4.3.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.2.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.3.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.17.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-2 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-2 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-2 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-2 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-2 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-2 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-2 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-2 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-2 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-2 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.2 Service for serving one or more HBase regions. hbase-client 1.2.2 HBase command-line client. hbase-rest-server 1.2.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.2 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-9 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-9 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-9 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-9 Hive command line client. hive-metastore-server 1.0.0-amzn-9 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-9 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.54+ MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.157.1 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.157.1 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.3 Spark command-line clients. spark-history-server 1.6.3 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.3 In-memory execution engine for YARN. spark-yarn-slave 1.6.3 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.6.1 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.9 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.9 ZooKeeper command line client. Release 4.9.2 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.9.2 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 4.9.1 Application Versions
The following applications are supported in this release:Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 4.9.1 release. Changes are relative to the Amazon EMR 4.8.4 release.
Release date: April 10, 2017
Known Issues Resolved from the Previous Releases
-
Backports of HIVE-9976
and HIVE-10106 -
Fixed an issue in YARN where a large number of nodes (greater than 2,000) and containers (greater than 5,000) would cause an out-of-memory error, for example:
"Exception in thread main java.lang.OutOfMemoryError"
.
Changes and Enhancements
-
Amazon EMR releases are now based on Amazon Linux 2017.03. For more information, see https://aws.amazon.com/amazon-linux-ami/2017.03-release-notes/
. -
Removed Python 2.6 from the Amazon EMR base Linux image. You can install Python 2.6 manually if necessary.
Release 4.9.1 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 4.2.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.2.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.3.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.15.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-2 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-2 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-2 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-2 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-2 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-2 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-2 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-2 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-2 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-2 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.2 Service for serving one or more HBase regions. hbase-client 1.2.2 HBase command-line client. hbase-rest-server 1.2.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.2 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-9 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-9 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-9 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-9 Hive command line client. hive-metastore-server 1.0.0-amzn-9 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-9 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.54+ MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A lightweight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.157.1 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.157.1 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.3 Spark command-line clients. spark-history-server 1.6.3 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.3 In-memory execution engine for YARN. spark-yarn-slave 1.6.3 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.6.1 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.9 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.9 ZooKeeper command line client. Release 4.9.1 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.8.5 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 4.8.x
-
There are multiple releases within the 4.8 series. Choose a link below to see information for a specific release within this tab.
4.8.5 |4.8.4 | 4.8.3 | 4.8.2 | 4.8.0
Important In this release version, Amazon EMR uses AWS Signature Version 4 exclusively to authenticate requests to Amazon S3. For more information, see Whats New.
Release 4.8.4 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 4.8.4 release. Changes are relative to the Amazon EMR 4.8.3 release.
Release date: February 7, 2017
Minor changes, bug fixes, and enhancements were made in this release.
Release 4.8.4 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 4.2.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.2.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.2.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.14.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-1 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-1 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-1 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-1 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-1 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-1 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-1 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-1 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.2 Service for serving one or more HBase regions. hbase-client 1.2.2 HBase command-line client. hbase-rest-server 1.2.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.2 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-8 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-8 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-8 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-8 Hive command line client. hive-metastore-server 1.0.0-amzn-8 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-8 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.54+ MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.157.1 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.157.1 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.3 Spark command-line clients. spark-history-server 1.6.3 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.3 In-memory execution engine for YARN. spark-yarn-slave 1.6.3 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.25+ Apache HTTP server. zeppelin-server 0.6.1 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.9 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.9 ZooKeeper command line client. Release 4.8.4 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.8.4 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 4.8.3 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 4.8.3 release. Changes are relative to the Amazon EMR 4.8.2 release.
Release date: December 29, 2016
Upgrades
-
Upgraded to Presto 0.157.1. For more information, see Presto Release Notes
in the Presto documentation. -
Upgraded to Spark 1.6.3. For more information, see Spark Release Notes
in the Apache Spark documentation. -
Upgraded to ZooKeeper 3.4.9. For more information, see ZooKeeper Release Notes
in the Apache ZooKeeper documentation.
Changes and Enhancements
-
Added support for the Amazon EC2 m4.16xlarge instance type in Amazon EMR version 4.8.3 and later, excluding 5.0.0, 5.0.3, and 5.2.0.
-
Amazon EMR releases are now based on Amazon Linux 2016.09. For more information, see https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/
.
Known Issues Resolved from the Previous Releases
-
Fixed an issue in Hadoop where the ReplicationMonitor thread could get stuck for a long time because of a race between replication and deletion of the same file in a large cluster.
-
Fixed an issue where ControlledJob#toString failed with a null pointer exception (NPE) when job status was not successfully updated.
Release 4.8.3 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 4.2.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.2.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.2.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.13.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-1 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-1 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-1 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-1 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-1 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-1 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-1 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-1 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.2 Service for serving one or more HBase regions. hbase-client 1.2.2 HBase command-line client. hbase-rest-server 1.2.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.2 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-8 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-8 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-8 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-8 Hive command line client. hive-metastore-server 1.0.0-amzn-8 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-8 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.52 MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.157.1 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.157.1 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.3 Spark command-line clients. spark-history-server 1.6.3 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.3 In-memory execution engine for YARN. spark-yarn-slave 1.6.3 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.23 Apache HTTP server. zeppelin-server 0.6.1 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.9 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.9 ZooKeeper command line client. Release 4.8.3 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.8.3 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 4.8.2 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 4.8.2 release. Changes are relative to the Amazon EMR 4.8.0 release.
Release date: October 24, 2016
Upgrades
-
Upgraded to Hadoop 2.7.3
-
Upgraded to Presto 0.152.3, which includes support for the Presto web interface. You can access the Presto web interface on the Presto coordinator using port 8889. For more information about the Presto web interface, see Web Interface
in the Presto documentation. -
Amazon EMR releases are now based on Amazon Linux 2016.09. For more information, see https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/
.
Release 4.8.2 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 4.1.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.1.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.2.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.10.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.3-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.3-amzn-0 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.3-amzn-0 HDFS command-line client and library hadoop-hdfs-namenode 2.7.3-amzn-0 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.3-amzn-0 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.3-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.3-amzn-0 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.3-amzn-0 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.3-amzn-0 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.3-amzn-0 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.2 Service for serving one or more HBase regions. hbase-client 1.2.2 HBase command-line client. hbase-rest-server 1.2.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.2 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-7 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-7 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-7 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-7 Hive command line client. hive-metastore-server 1.0.0-amzn-7 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-7 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.52 MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.152.3 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.152.3 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.2 Spark command-line clients. spark-history-server 1.6.2 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.2 In-memory execution engine for YARN. spark-yarn-slave 1.6.2 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.23 Apache HTTP server. zeppelin-server 0.6.1 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.8 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.8 ZooKeeper command line client. Release 4.8.2 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.8.2 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 4.8.0 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 4.8.0 release. Changes are relative to the Amazon EMR 4.7.2 release.
Release date: September 7, 2016
Upgrades
-
Upgraded to HBase 1.2.2
-
Upgraded to Presto-Sandbox 0.151
-
Upgraded to Tez 0.8.4
-
Upgraded to Zeppelin-Sandbox 0.6.1
Changes and Enhancements
-
Fixed an issue in YARN where the ApplicationMaster would attempt to clean up containers that no longer exist because their instances have been terminated.
-
Corrected the hive-server2 URL for Hive2 actions in the Oozie examples.
-
Added support for additional Presto catalogs.
-
Backported patches: HIVE-8948
, HIVE-12679 , HIVE-13405 , PHOENIX-3116 , HADOOP-12689 -
Added support for security configurations, which allow you to create and apply encryption options more easily. For more information, see Data Encryption.
Release 4.8.0 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 3.2.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.1.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.2.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.9.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.2-amzn-4 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.2-amzn-4 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.2-amzn-4 HDFS command-line client and library hadoop-hdfs-namenode 2.7.2-amzn-4 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.2-amzn-4 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.2-amzn-4 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.2-amzn-4 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.2-amzn-4 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.2-amzn-4 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.2-amzn-4 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.2 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.2 Service for serving one or more HBase regions. hbase-client 1.2.2 HBase command-line client. hbase-rest-server 1.2.2 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.2 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-7 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-7 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-7 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-7 Hive command line client. hive-metastore-server 1.0.0-amzn-7 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-7 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.51 MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.151 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.151 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.2 Spark command-line clients. spark-history-server 1.6.2 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.2 In-memory execution engine for YARN. spark-yarn-slave 1.6.2 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.4 The tez YARN application and libraries. webserver 2.4.23 Apache HTTP server. zeppelin-server 0.6.1 Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.8 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.8 ZooKeeper command line client. Release 4.8.0 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.8.0 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hiveserver2-site
Change values in Hive Server2's hiveserver2-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-blackhole
Change values in Presto's blackhole.properties file.
presto-connector-cassandra
Change values in Presto's cassandra.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
presto-connector-jmx
Change values in Presto's jmx.properties file.
presto-connector-kafka
Change values in Presto's kafka.properties file.
presto-connector-localfile
Change values in Presto's localfile.properties file.
presto-connector-mongodb
Change values in Presto's mongodb.properties file.
presto-connector-mysql
Change values in Presto's mysql.properties file.
presto-connector-postgresql
Change values in Presto's postgresql.properties file.
presto-connector-raptor
Change values in Presto's raptor.properties file.
presto-connector-redis
Change values in Presto's redis.properties file.
presto-connector-tpch
Change values in Presto's tpch.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 4.7.x
-
There are multiple releases within the 4.7 series. Choose a link below to see information for a specific release within this tab.
Important In this release version, Amazon EMR uses AWS Signature Version 4 exclusively to authenticate requests to Amazon S3. For more information, see Whats New.
Release 4.7.2 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR 4.7.2.
Release date: July 15, 2016
Features
-
Upgraded to Mahout 0.12.2
-
Upgraded to Presto 0.148
-
Upgraded to Spark 1.6.2
-
You can now create an AWSCredentialsProvider for use with EMRFS using a URI as a parameter. For more information, see Create an AWSCredentialsProvider for EMRFS.
-
EMRFS now allows users to configure a custom DynamoDB endpoint for their Consistent View metadata using the
fs.s3.consistent.dynamodb.endpoint
property inemrfs-site.xml
. -
Added a script in
/usr/bin
calledspark-example
, which wraps/usr/lib/spark/spark/bin/run-example
so you can run examples directly. For instance, to run the SparkPi example that comes with the Spark distribution, you can runspark-example SparkPi 100
from the command line or usingcommand-runner.jar
as a step in the API.
Known Issues Resolved from Previous Releases
-
Fixed an issue where Oozie had the
spark-assembly.jar
was not in the correct location when Spark was also installed, which resulted in failure to launch Spark applications with Oozie. -
Fixed an issue with Spark Log4j-based logging in YARN containers.
Release 4.7.2 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 3.2.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.1.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.2.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.8.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.2-amzn-3 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.2-amzn-3 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.2-amzn-3 HDFS command-line client and library hadoop-hdfs-namenode 2.7.2-amzn-3 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.2-amzn-3 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.2-amzn-3 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.2-amzn-3 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.2-amzn-3 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.2-amzn-3 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.2-amzn-3 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.1 Service for serving one or more HBase regions. hbase-client 1.2.1 HBase command-line client. hbase-rest-server 1.2.1 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.1 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-6 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-6 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-6 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-6 Hive command line client. hive-metastore-server 1.0.0-amzn-6 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-6 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.2 Library for machine learning. mysql-server 5.5.46 MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.148 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.148 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.2 Spark command-line clients. spark-history-server 1.6.2 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.2 In-memory execution engine for YARN. spark-yarn-slave 1.6.2 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.3 The tez YARN application and libraries. webserver 2.4.23 Apache HTTP server. zeppelin-server 0.5.6-incubating Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.8 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.8 ZooKeeper command line client. Release 4.7.2 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.7.2 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hadoop-ssl-server
Change hadoop ssl server configuration
hadoop-ssl-client
Change hadoop ssl client configuration
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 4.7.1 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for Amazon EMR 4.7.1.
Release date: June 10, 2016
Known Issues Resolved from Previous Releases
-
Fixed an issue that extended the startup time of clusters launched in a VPC with private subnets. The bug only impacted clusters launched with the Amazon EMR 4.7.0 release.
-
Fixed an issue that improperly handled listing of files in Amazon EMR for clusters launched with the Amazon EMR 4.7.0 release.
Release 4.7.1 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 3.1.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.0.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.2.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.7.1 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.2-amzn-2 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.2-amzn-2 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.2-amzn-2 HDFS command-line client and library hadoop-hdfs-namenode 2.7.2-amzn-2 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.2-amzn-2 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.2-amzn-2 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.2-amzn-2 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.2-amzn-2 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.2-amzn-2 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.2-amzn-2 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.1 Service for serving one or more HBase regions. hbase-client 1.2.1 HBase command-line client. hbase-rest-server 1.2.1 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.1 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-5 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-5 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-5 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-5 Hive command line client. hive-metastore-server 1.0.0-amzn-5 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-5 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.0 Library for machine learning. mysql-server 5.5.46 MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.147 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.147 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.1 Spark command-line clients. spark-history-server 1.6.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.1 In-memory execution engine for YARN. spark-yarn-slave 1.6.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.3 The tez YARN application and libraries. webserver 2.4.18 Apache HTTP server. zeppelin-server 0.5.6-incubating Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.8 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.8 ZooKeeper command line client. Release 4.7.1 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.7.1 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
Release 4.7.0 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Phoenix , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Tez , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
Important Amazon EMR 4.7.0 is deprecated. Use Amazon EMR 4.7.1 or later instead.
Release date: June 2, 2016
Features
-
Added Apache Phoenix 4.7.0
-
Added Apache Tez 0.8.3
-
Upgraded to HBase 1.2.1
-
Upgraded to Mahout 0.12.0
-
Upgraded to Presto 0.147
-
Upgraded the AWS SDK for Java to 1.10.75
-
The final flag was removed from the
mapreduce.cluster.local.dir
property inmapred-site.xml
to allow users to run Pig in local mode. -
Amazon Redshift JDBC Drivers Available on Cluster
Amazon Redshift JDBC drivers are now included at
/usr/share/aws/redshift/jdbc
./usr/share/aws/redshift/jdbc/RedshiftJDBC41.jar
is the JDBC 4.1-compatible Amazon Redshift driver and/usr/share/aws/redshift/jdbc/RedshiftJDBC4.jar
is the JDBC 4.0-compatible Amazon Redshift driver. For more information, see Configure a JDBC Connection in the Amazon Redshift Cluster Management Guide. -
Java 8
Except for Presto, OpenJDK 1.7 is the default JDK used for all applications. However, both OpenJDK 1.7 and 1.8 are installed. For information about how to set
JAVA_HOME
for applications, see Configuring Applications to Use Java 8.
Known Issues Resolved from Previous Releases
-
Fixed a kernel issue that significantly affected performance on Throughput Optimized HDD (st1) EBS volumes for Amazon EMR in emr-4.6.0.
-
Fixed an issue where a cluster would fail if any HDFS encryption zone were specified without choosing Hadoop as an application.
-
Changed the default HDFS write policy from
RoundRobin
toAvailableSpaceVolumeChoosingPolicy
. Some volumes were not properly utilized with the RoundRobin configuration, which resulted in failed core nodes and an unreliable HDFS. -
Fixed an issue with the EMRFS CLI, which would cause an exception when creating the default DynamoDB metadata table for consistent views.
-
Fixed a deadlock issue in EMRFS that potentially occurred during multipart rename and copy operations.
-
Fixed an issue with EMRFS that caused the CopyPart size default to be 5 MB. The default is now properly set at 128 MB.
-
Fixed an issue with the Zeppelin upstart configuration that potentially prevented you from stopping the service.
-
Fixed an issue with Spark and Zeppelin, which prevented you from using the
s3a://
URI scheme because/usr/lib/hadoop/hadoop-aws.jar
was not properly loaded in their respective classpath. -
Backported HUE-2484
. -
Backported a commit
from Hue 3.9.0 (no JIRA exists) to fix an issue with the HBase browser sample. -
Backported HIVE-9073
.
Release 4.7.0 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 3.1.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.0.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.2.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.4.0 Distributed copy application optimized for Amazon S3. emrfs 2.7.1 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.2-amzn-2 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.2-amzn-2 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.2-amzn-2 HDFS command-line client and library hadoop-hdfs-namenode 2.7.2-amzn-2 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.2-amzn-2 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.2-amzn-2 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.2-amzn-2 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.2-amzn-2 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.2-amzn-2 YARN service for allocating and managing cluster resources and distributed applications. hadoop-yarn-timeline-server 2.7.2-amzn-2 Service for retrieving current and historical information for YARN applications. hbase-hmaster 1.2.1 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.1 Service for serving one or more HBase regions. hbase-client 1.2.1 HBase command-line client. hbase-rest-server 1.2.1 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.1 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-5 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-5 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-5 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-5 Hive command line client. hive-metastore-server 1.0.0-amzn-5 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-5 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-7 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.12.0 Library for machine learning. mysql-server 5.5.46 MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. phoenix-library 4.7.0-HBase-1.2 The phoenix libraries for server and client phoenix-query-server 4.7.0-HBase-1.2 A light weight server providing JDBC access as well as Protocol Buffers and JSON format access to the Avatica API presto-coordinator 0.147 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.147 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.1 Spark command-line clients. spark-history-server 1.6.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.1 In-memory execution engine for YARN. spark-yarn-slave 1.6.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. tez-on-yarn 0.8.3 The tez YARN application and libraries. webserver 2.4.18 Apache HTTP server. zeppelin-server 0.5.6-incubating Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.8 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.8 ZooKeeper command line client. Release 4.7.0 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.7.0 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
phoenix-hbase-metrics
Change values in Phoenix's hadoop-metrics2-hbase.properties file.
phoenix-hbase-site
Change values in Phoenix's hbase-site.xml file.
phoenix-log4j
Change values in Phoenix's log4j.properties file.
phoenix-metrics
Change values in Phoenix's hadoop-metrics2-phoenix.properties file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
tez-site
Change values in Tez's tez-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 4.6.0
-
Amazon EMR Release 4.6.0
Release 4.6.0 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HBase , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , Zeppelin-Sandbox , and ZooKeeper-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 4.6.0 release.
-
Added HBase 1.2.0
-
Added Zookeeper-Sandbox 3.4.8
-
Upgraded to Presto-Sandbox 0.143
-
Amazon EMR releases are now based on Amazon Linux 2016.03.0. For more information, see https://aws.amazon.com/amazon-linux-ami/2016.03-release-notes/
. -
Issue Affecting Throughput Optimized HDD (st1) EBS Volume Types
An issue in the Linux kernel versions 4.2 and above significantly affects performance on Throughput Optimized HDD (st1) EBS volumes for EMR. This release (emr-4.6.0) uses kernel version 4.4.5 and hence is impacted. Therefore, we recommend not using emr-4.6.0 if you want to use st1 EBS volumes. You can use emr-4.5.0 or prior Amazon EMR releases with st1 without impact. In addition, we provide the fix with future releases.
-
Python Defaults
Python 3.4 is now installed by default, but Python 2.7 remains the system default. You may configure Python 3.4 as the system default using either a bootstrap action; you can use the configuration API to set PYSPARK_PYTHON export to
/usr/bin/python3.4
in thespark-env
classification to affect the Python version used by PySpark. -
Java 8
Except for Presto, OpenJDK 1.7 is the default JDK used for all applications. However, both OpenJDK 1.7 and 1.8 are installed. For information about how to set
JAVA_HOME
for applications, see Configuring Applications to Use Java 8.
Known Issues Resolved from Previous Releases
-
Fixed an issue where application provisioning would sometimes randomly fail due to a generated password.
-
Previously,
mysqld
was installed on all nodes. Now, it is only installed on the master instance and only if the chosen application includesmysql-server
as a component. Currently, the following applications include themysql-server
component: HCatalog, Hive, Hue, Presto-Sandbox, and Sqoop-Sandbox. -
Changed
yarn.scheduler.maximum-allocation-vcores
to 80 from the default of 32, which fixes an issue introduced in emr-4.4.0 that mainly occurs with Spark while using themaximizeResourceAllocation
option in a cluster whose core instance type is one of a few large instance types that have the YARN vcores set higher than 32; namely c4.8xlarge, cc2.8xlarge, hs1.8xlarge, i2.8xlarge, m2.4xlarge, r3.8xlarge, d2.8xlarge, or m4.10xlarge were affected by this issue. -
s3-dist-cp now uses EMRFS for all Amazon S3 nominations and no longer stages to a temporary HDFS directory.
-
Fixed an issue with exception handling for client-side encryption multipart uploads.
-
Added an option to allow users to change the Amazon S3 storage class. By default this setting is
STANDARD
. Theemrfs-site
configuration classification setting isfs.s3.storageClass
and the possible values areSTANDARD
,STANDARD_IA
, andREDUCED_REDUNDANCY
. For more information about storage classes, see Storage Classes in the Amazon Simple Storage Service Developer Guide.
Release 4.6.0 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 3.0.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.0.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.1.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.3.0 Distributed copy application optimized for Amazon S3. emrfs 2.6.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.2-amzn-1 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.2-amzn-1 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.2-amzn-1 HDFS command-line client and library hadoop-hdfs-namenode 2.7.2-amzn-1 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.2-amzn-1 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.2-amzn-1 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.2-amzn-1 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.2-amzn-1 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.2-amzn-1 YARN service for allocating and managing cluster resources and distributed applications. hbase-hmaster 1.2.0 Service for an HBase cluster responsible for coordination of Regions and execution of administrative commands. hbase-region-server 1.2.0 Service for serving one or more HBase regions. hbase-client 1.2.0 HBase command-line client. hbase-rest-server 1.2.0 Service providing a RESTful HTTP endpoint for HBase. hbase-thrift-server 1.2.0 Service providing a Thrift endpoint to HBase. hcatalog-client 1.0.0-amzn-4 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-4 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-4 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-4 Hive command line client. hive-metastore-server 1.0.0-amzn-4 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-4 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-6 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.11.1 Library for machine learning. mysql-server 5.5 MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. presto-coordinator 0.143 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.143 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.1 Spark command-line clients. spark-history-server 1.6.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.1 In-memory execution engine for YARN. spark-yarn-slave 1.6.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. webserver 2.4 Apache HTTP server. zeppelin-server 0.5.6-incubating Web-based notebook that enables interactive data analytics. zookeeper-server 3.4.8 Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. zookeeper-client 3.4.8 ZooKeeper command line client. Release 4.6.0 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.6.0 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hbase-env
Change values in HBase's environment.
hbase-log4j
Change values in HBase's hbase-log4j.properties file.
hbase-metrics
Change values in HBase's hadoop-metrics2-hbaase.properties file.
hbase-policy
Change values in HBase's hbase-policy.xml file.
hbase-site
Change values in HBase's hbase-site.xml file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
hive-log4j
Change values in Hive's hive-log4j.properties file.
hive-site
Change values in Hive's hive-site.xml file
hue-ini
Change values in Hue's ini file
httpfs-env
Change values in the HTTPFS environment.
httpfs-site
Change values in Hadoop's httpfs-site.xml file.
hadoop-kms-acls
Change values in Hadoop's kms-acls.xml file.
hadoop-kms-env
Change values in the Hadoop KMS environment.
hadoop-kms-log4j
Change values in Hadoop's kms-log4j.properties file.
hadoop-kms-site
Change values in Hadoop's kms-site.xml file.
mapred-env
Change values in the MapReduce application's environment.
mapred-site
Change values in the MapReduce application's mapred-site.xml file.
oozie-env
Change values in Oozie's environment.
oozie-log4j
Change values in Oozie's oozie-log4j.properties file.
oozie-site
Change values in Oozie's oozie-site.xml file.
pig-properties
Change values in Pig's pig.properties file.
pig-log4j
Change values in Pig's log4j.properties file.
presto-log
Change values in Presto's log.properties file.
presto-config
Change values in Presto's config.properties file.
presto-connector-hive
Change values in Presto's hive.properties file.
spark
Amazon EMR-curated settings for Apache Spark.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-log4j
Change values in Spark's log4j.properties file.
spark-metrics
Change values in Spark's metrics.properties file.
sqoop-env
Change values in Sqoop's environment.
sqoop-oraoop-site
Change values in Sqoop OraOop's oraoop-site.xml file.
sqoop-site
Change values in Sqoop's sqoop-site.xml file.
yarn-env
Change values in the YARN environment.
yarn-site
Change values in YARN's yarn-site.xml file.
zeppelin-env
Change values in the Zeppelin environment.
zookeeper-config
Change values in ZooKeeper's zoo.cfg file.
zookeeper-log4j
Change values in ZooKeeper's log4j.properties file.
-
- 4.5.0
-
Amazon EMR Release 4.5.0
Release 4.5.0 Application Versions
The following applications are supported in this release: Ganglia
, Hadoop , HCatalog , Hive , Hue , Mahout , Oozie-Sandbox , Pig , Presto-Sandbox , Spark , Sqoop-Sandbox , and Zeppelin-Sandbox . The diagram below depicts the application versions available in this release of Amazon EMR and the application versions in the preceding four Amazon EMR releases.
For a comprehensive history of application versions for each release of Amazon EMR, see the following diagrams:
The following release notes include information for the Amazon EMR 4.5.0 release.
Release date: April 4, 2016
Features
-
Upgraded to Spark 1.6.1
-
Upgraded to Hadoop 2.7.2
-
Upgraded to Presto 0.140
-
Added AWS KMS support for Amazon S3 server-side encryption.
Known Issues Resolved from Previous Releases
-
Fixed an issue where MySQL and Apache servers would not start after a node was rebooted.
-
Fixed an issue where IMPORT did not work correctly with non-partitioned tables stored in Amazon S3
-
Fixed an issue with Presto where it requires the staging directory to be
/mnt/tmp
rather than/tmp
when writing to Hive tables.
Release 4.5.0 Component Versions
The components that Amazon EMR installs with this release are listed below. Some are installed as part of big-data application packages. Others are unique to Amazon EMR and installed for system processes and features. These typically start with
emr
oraws
. Big-data application packages in the most recent Amazon EMR release are usually the latest version found in the community. We make community releases available in Amazon EMR as quickly as possible.Some components in Amazon EMR differ from community versions. These components have a version label in the form
. TheCommunityVersion
-amzn-EmrVersion
starts at 0. For example, if open source community component namedEmrVersion
myapp-component
with version 2.2 has been modified three times for inclusion in different Amazon EMR release versions, its release version is listed as2.2-amzn-2
.Component Version Description emr-ddb 3.0.0 Amazon DynamoDB connector for Hadoop ecosystem applications. emr-goodies 2.0.0 Extra convenience libraries for the Hadoop ecosystem. emr-kinesis 3.1.0 Amazon Kinesis connector for Hadoop ecosystem applications. emr-s3-dist-cp 2.2.0 Distributed copy application optimized for Amazon S3. emrfs 2.5.0 Amazon S3 connector for Hadoop ecosystem applications. ganglia-monitor 3.7.2 Embedded Ganglia agent for Hadoop ecosystem applications along with the Ganglia monitoring agent. ganglia-metadata-collector 3.7.2 Ganglia metadata collector for aggregating metrics from Ganglia monitoring agents. ganglia-web 3.7.1 Web application for viewing metrics collected by the Ganglia metadata collector. hadoop-client 2.7.2-amzn-0 Hadoop command-line clients such as 'hdfs', 'hadoop', or 'yarn'. hadoop-hdfs-datanode 2.7.2-amzn-0 HDFS node-level service for storing blocks. hadoop-hdfs-library 2.7.2-amzn-0 HDFS command-line client and library hadoop-hdfs-namenode 2.7.2-amzn-0 HDFS service for tracking file names and block locations. hadoop-httpfs-server 2.7.2-amzn-0 HTTP endpoint for HDFS operations. hadoop-kms-server 2.7.2-amzn-0 Cryptographic key management server based on Hadoop's KeyProvider API. hadoop-mapred 2.7.2-amzn-0 MapReduce execution engine libraries for running a MapReduce application. hadoop-yarn-nodemanager 2.7.2-amzn-0 YARN service for managing containers on an individual node. hadoop-yarn-resourcemanager 2.7.2-amzn-0 YARN service for allocating and managing cluster resources and distributed applications. hcatalog-client 1.0.0-amzn-4 The 'hcat' command line client for manipulating hcatalog-server. hcatalog-server 1.0.0-amzn-4 Service providing HCatalog, a table and storage management layer for distributed applications. hcatalog-webhcat-server 1.0.0-amzn-4 HTTP endpoint providing a REST interface to HCatalog. hive-client 1.0.0-amzn-4 Hive command line client. hive-metastore-server 1.0.0-amzn-4 Service for accessing the Hive metastore, a semantic repository storing metadata for SQL on Hadoop operations. hive-server 1.0.0-amzn-4 Service for accepting Hive queries as web requests. hue-server 3.7.1-amzn-5 Web application for analyzing data using Hadoop ecosystem applications mahout-client 0.11.1 Library for machine learning. mysql-server 5.5 MySQL database server. oozie-client 4.2.0 Oozie command-line client. oozie-server 4.2.0 Service for accepting Oozie workflow requests. presto-coordinator 0.140 Service for accepting queries and managing query execution among presto-workers. presto-worker 0.140 Service for executing pieces of a query. pig-client 0.14.0-amzn-0 Pig command-line client. spark-client 1.6.1 Spark command-line clients. spark-history-server 1.6.1 Web UI for viewing logged events for the lifetime of a completed Spark application. spark-on-yarn 1.6.1 In-memory execution engine for YARN. spark-yarn-slave 1.6.1 Apache Spark libraries needed by YARN slaves. sqoop-client 1.4.6 Apache Sqoop command-line client. webserver 2.4 Apache HTTP server. zeppelin-server 0.5.6-incubating Web-based notebook that enables interactive data analytics. Release 4.5.0 Configuration Classifications
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
hive-site.xml
. For more information, see Configuring Applications.emr-4.5.0 ClassificationsClassifications Description capacity-scheduler
Change values in Hadoop's capacity-scheduler.xml file.
core-site
Change values in Hadoop's core-site.xml file.
emrfs-site
Change EMRFS settings.
hadoop-env
Change values in the Hadoop environment for all Hadoop components.
hadoop-log4j
Change values in Hadoop's log4j.properties file.
hdfs-encryption-zones
Configure HDFS encryption zones.
hdfs-site
Change values in HDFS's hdfs-site.xml.
hcatalog-env
Change values in HCatalog's environment.
hcatalog-server-jndi
Change values in HCatalog's jndi.properties.
hcatalog-server-proto-hive-site
Change values in HCatalog's proto-hive-site.xml.
hcatalog-webhcat-env
Change values in HCatalog WebHCat's environment.
hcatalog-webhcat-log4j
Change values in HCatalog WebHCat's log4j.properties.
hcatalog-webhcat-site
Change values in HCatalog WebHCat's webhcat-site.xml file.
hive-env
Change values in the Hive environment.
hive-exec-log4j
Change values in Hive's hive-exec-log4j.properties file.
-