Menu
Amazon Elastic MapReduce
Developer Guide

AMI Versions Supported in Amazon EMR Versions 2.x and 3.x

This documentation is for AMI versions 2.x and 3.x of Amazon EMR. For information about Amazon EMR releases 4.0.0 and above, see the Amazon EMR Release Guide. For information about managing the Amazon EMR service in 4.x releases, see the Amazon EMR Management Guide.

Amazon EMR supports the 3.x AMI versions listed in the following tables. The 2.x release series is now deprecated. You must specify the AMI version to use when you create a cluster using the AWS CLI and the console.

Important

We recommend you migrate to an Amazon EMR 4.x release. For more information, see the Amazon EMR Release Guide

Hadoop 2 AMI Versions

AMI VersionIncludesNotesRelease Date
3.11.0

This Amazon EMR AMI version provides the following bug fixes and changes:

  • Fixed a bug that prevented MapReduce JobHistory logs from pushing to Amazon S3.

  • Fixed bugs that prevented YARN container logs from being pushed to Amazon S3.

4 January 2016
3.10.0

This Amazon EMR AMI version provides the following bug fixes and changes:

  • Fixed a bug which was a regression to YARN-90 that caused issues with log aggregation to Amazon S3.

  • Patched YARN-3241 (YARN-3241.002.patch).

  • Set HBase logging level to WARN.

2 October 2015
3.9.0

This Amazon EMR AMI version provides the following major bug fixes:

  • Fixed an issue where multiple service daemon processes resulted in job failures.

  • Fixed an issue with orphaned daemon processes not being killed before a new one starts.

  • Fixed an issue that resulted in missing job history logs.

  • Fixed an issue in EMRFS client-side encryption where deleting without an instruction file present caused an uncaught exception and failure.

  • Provided a bootstrap action that fixes excessive logging encountered in HBase shell. For more information, see HBase Shell Excessive Debug Logging

19 August 2015
3.3.0

This Amazon EMR AMI version provides the following features and bug fixes:

Major Feature Updates

  • Amazon EMR now supports HUE, an open-source user interface for Hadoop that makes it easier to interact with your cluster. With Hue, you can run and develop Hive queries, manage files in HDFS, run and develop Pig scripts, and manage tables. Hue also enables you to browse and use files in Amazon S3. For more information, see Configure Hue to View, Query, or Manipulate Data.

  • Include Oozie as part of Hue release.

6 November 2014

Legacy AMIs

The following AMIs are deprecated. Although you may be able to launch clusters with these AMIs, they may have defects or older software we no longer support. However, we have kept the documentation here for your reference.

Deprecated Hadoop 2 AMI Versions

AMI VersionIncludesNotesRelease Date
3.8.0

This Amazon EMR AMI version provides the following:

Major Feature Release

Major Bug Fixes

  • Fixed a bug in the Amazon EMR-DynamoDB connector which resulted in the errant generation of zero map tasks if the master and core instance types were different.

  • Fixed an issue in Hive that resulted in the deletion of the target S3 folder of an explain insert overwrite table operation.

  • Includes the following patches: HIVE-8746, HIVE-8566, HIVE-8162, PIG-4496

Other Issues

  • Added srcPrefixesFile to S3DistCp. For more information, see S3DistCp Options

  • The Amazon EMR-DynamoDB connector now allows customers to load a custom AWS credentials provider for use with the connector.

  • Remove unused property, skips3scratch, from hive-default.xml.

10 June 2015
3.7.0

This Amazon EMR AMI version provides the following feature release:

Major Bug Fixes

  • /var/log is moved to /mnt/var/log. Any files written to /var/log will be written to /mnt/var/log as there is now a symbolic link between the two paths.

  • Addresses an issue where some components would cause certain scripts running out of /etc/init.d to not work, causing issues with the yum installer.

  • Hue no longer starts the HBase Thrift server when installed.

  • Python's base and package installations now use pip instead of easy_install.

  • The default Ruby will now match the latest released Amazon Linux. Previously, Amazon EMR would use a previous version of Ruby. While these interpreters are still available, the default will now reflect the latest version of Amazon Linux.

Other Issues

  • Amazon EMR now honors the DHCP configuration of its VPC entirely. Some commands that used to return a fully-qualified domain name will now only return the Amazon EC2 name of the host. Scripts that expect this behavior will fail. For more information, see Errors That Result in START_FAILED

21 April 2015
3.6.0

This Amazon EMR AMI version provides the following feature release:

Major Feature Release

24 March 2015
3.5.0

This Amazon EMR AMI version provides the following features and bug fixes:

Major Bug Fixes

  • Fixes a bug which prevented s3distcp from using temporary credentials (such as with Amazon STS tokens).

  • Fixes a bug that caused Amazon S3 multipart upload to hang on an individual upload.

  • Fixes performance issues with the Hive-DynamoDB connector.

  • Fixes an issue encountered in the Hue redirection middleware, which caused redirection to fail when redirected to SSL endpoints

10 March 2015
3.4.0

This Amazon EMR AMI version provides the following features and bug fixes:

Major Feature Updates

  • Updated Hue version from 3.6 to 3.7.1.

  • Added support for EBS-backed HVM instances in all regions.

  • Added General Purpose EBS root volume for all instance types that use EBS-backed HVM in all regions except for US East (N. Virginia), US West (N. California), and Asia Pacific (Tokyo) where instances in those regions use Standard EBS volumes.

  • Added a modification to the Amazon Kinesis connector which requires customers to co-locate the DynamoDB table with their EMR cluster in the same region.

Major Bug Fixes

  • Minor performance-related bug fixes.

  • Includes this patch: HIVE-7323

26 February 2015
3.3.2

This Amazon EMR AMI version provides the following features and bug fixes:

Major Feature Updates

  • Fixes performance issues in DynamoDB connector

Major Bug Fixes

  • Fixes an issue in Hue encountered when copying files greater than 64MB from HDFS to Amazon S3.

  • Fixes an issue in the Hue S3 Browser which incorrectly handled non-ASCII key names in the EU (Frankfurt) and China (Beijing) regions.

  • Fixes an issue encountered in Hue when using a remote Hive Metastore database that has existing Hue sample tables.

  • Fixed an issue encountered in Hue when saving results from a multiple statement Hive query.

  • Fixes a mismatch between the installed default Python and pip.

  • Includes this patch: HIVE-7426

4 February 2015
3.3.1

This Amazon EMR AMI version provides the following features and bug fixes:

Major Bug Fixes

  • Fixes an issue where installing Impala caused certain Hive commands to fail.

  • Fixed multiple issues in Hue Amazon S3 browser when used in the EU (Frankfurt) region.

  • User home directories are now created if not found, fixing an issue where re-using an external Hue database resulted in some services to fail, notably Oozie.

  • Fixed a security issue in the Hue Pig editor.

  • Fixed the redirect_whitelist Hue configuration option allowing whitelisting of domains to which Hue can redirect.

  • Fixed an issue with loading saved queries in Hive Editor when browsing Hue in Firefox.

  • Removed permission buttons in the Hue Amazon S3 browser as those operations are not supported.

  • Fixed an issue with the configure-hadoop script which caused a loss of file permissions when used.

20 November 2014
3.3.0

This Amazon EMR AMI version provides the following features and bug fixes:

Major Feature Updates

  • Amazon EMR now supports HUE, an open-source user interface for Hadoop that makes it easier to interact with your cluster. With Hue, you can run and develop Hive queries, manage files in HDFS, run and develop Pig scripts, and manage tables. Hue also enables you to browse and use files in Amazon S3. For more information, see Configure Hue to View, Query, or Manipulate Data.

  • Include Oozie as part of Hue release.

6 November 2014
3.2.3

This Amazon EMR AMI version provides the following features and bug fixes:

Major Feature Updates

  • EMRFS now supports Amazon S3 eventual consistency notifications for Amazon SQS and eventual consistency metrics for CloudWatch.

  • Performance optimizations for Amazon S3 multipart uploads with EMRFS.

  • The configure-hadoop bootstrap action now supports configuring log levels for different Hadoop daemons. You can now configure separate appenders for each daemon, and the following new identifiers are provided: HADOOP, MAPRED, JHS, and YARN.

Other Feature Updates

  • At least 3 hours of log files are kept uncompressed on disk for better debugging. Older logs are removed with the exception of active application logs. These logs continue to remain uncompressed, on-disk unless they are rotated by log4j.

  • Once they are compressed, log files are uploaded to Amazon S3 with special headers to allow browsing (if the raw log file size is less than 500MB). If the size is greater than 500MB, you are prompted to download the file.

  • Compressed log files are now kept in a temporary directory in the same directory as the original log. If the log is in /mnt/var/log/hadoop, the compressed log is stored in /mnt/tmp/mnt/var/log/hadoop until the log retention period expires.

  • Compressed log files larger than 4GB are not uploaded to Amazon S3.

  • A temporary directory cleaner is now included that cleans up temporary files in /mnt/tmp.

Major Bug Fixes

  • Fixes an issue where the Hive server does not start after installation.

  • Fixes an issue where the Hive web interface does not function properly.

  • Fixes an issue where the HBase restore procedure corrupts the source cluster if it is still running.

  • Adds support for reuse of file statuses in the Pig Zebra format split calculation logic to improve performance.

  • Adds multithreaded creation of Pig Zebra format indexes during MapReduce job closure to improve performance.

  • Adds support for the legacy location of piggybank.jar:/home/hadoop/lib/pig/.

  • Fixes an issue where Pig does not use automatic parallelism if the source file system is Amazon S3.

  • Fixes an issue where S3DistCp ignores all -D CLI parameters.

  • Includes the following patches: YARN-2008,YARN-1857, YARN-1198, YARN-1680, MAPREDUCE-6111, HDFS-7005, HIVE-7147, HIVE-8137, HIVE-4629, HIVE-6245

31 October 2014
3.2.1

This Amazon EMR AMI version provides the following fixes:

Major Bug Fixes

Other Bug Fixes

  • Fixes a port forwarding issue encountered with Hive.

  • Fixes an issue encountered with HiveMetaStoreChecker.

  • Included a fix for: HIVE-7085.

16 September 2014
3.2.0

This Amazon EMR AMI version provides the following features:

Major Feature Updates

  • Added Apache Hive 0.13.1. For more information, go to Hive version documentation and http://hive.apache.org/downloads.html.

  • Provided a change to the connector for Amazon Kinesis that takes a flag, kinesis.iteration.timeout.ignore.failure, to allow a job to continue checkpointing even if it has reached the timeout value.

Major Bug Fixes

3 September 2014
3.1.4

This Amazon EMR AMI version provides the following features and bug fixes:

Major Feature Updates

  • EMRFS now supports Amazon S3 eventual consistency notifications for Amazon SQS and eventual consistency metrics for CloudWatch.

  • Performance optimizations for Amazon S3 multipart uploads with EMRFS.

  • The configure-hadoop bootstrap action now supports configuring log levels for different Hadoop daemons. You can now configure separate appenders for each daemon, and the following new identifiers are provided: HADOOP, MAPRED, JHS, and YARN.

Other Feature Updates

  • At least 3 hours of log files are kept uncompressed on disk for better debugging. Older logs are removed with the exception of active application logs. These logs continue to remain uncompressed, on-disk unless they are rotated by log4j.

  • Once they are compressed, log files are uploaded to Amazon S3 with special headers to allow browsing (if the raw log file size is less than 500MB). If the size is greater than 500MB, you are prompted to download the file.

  • Compressed log files are now kept in a temporary directory in the same directory as the original log. If the log is in /mnt/var/log/hadoop, the compressed log is stored in /mnt/tmp/mnt/var/log/hadoop until the log retention period expires.

  • Compressed log files larger than 4GB are not uploaded to Amazon S3.

  • A temporary directory cleaner is now included that cleans up temporary files in /mnt/tmp.

Major Bug Fixes

  • Fixes an issue where Pig does not use automatic parallelism if the source file system is Amazon S3.

  • Fixes an issue where the HBase restore procedure corrupts the source cluster if it is still running.

  • Fixes an issue where S3DistCp ignores all -D CLI parameters.

  • Includes the following patches: YARN-2008,YARN-1857, YARN-1198, YARN-1680, MAPREDUCE-6111, HDFS-7005, HIVE-2777

31 October 2014
3.1.2

This Amazon EMR AMI version provides the following bug fixes:

Major Bug Fixes

16 September 2014
3.1.1

This Amazon EMR AMI version provides the following bug fixes and enhancements:

Major Feature Updates

Major Bug Fixes

  • Fixed a bug that placed Application Master services on instances in the Task group, which may have resulted in the undesired termination of certain Application Master daemons.

  • Fixed a bug that prevented clusters from moving or copying files larger than 5 GB.

  • Fixed a bug that prevented users from launching Hive in local mode.

  • Fixed a bug that prevented NodeManager from using all available mountpoints, which resulted in issues using ephemeral drives on certain instances.

  • Fixed an issue in ResourceManager, which prevented users from accessing user interfaces on localhost.

  • Fixed a bug in JobHistory that may have prevented storage of JobHistory logs in Amazon S3 buckets.

  • Included the following Hadoop patches: HDFS-6701, HDFS-6460, HADOOP-10456, HDFS-6268, MAPREDUCE-5900.

  • Backport of YARN-1864 to Hadoop 2.

  • Fixed a performance regression in Hive.

  • Hive is compiled against JDK 1.7

  • Included the following Hive patches: HIVE-6938, HIVE-7429.

  • Fixed several Hbase bugs.

  • The connector for Amazon Kinesis for now supports all regions where Amazon Kinesis is available.

15 August 2014
3.1.0

This Amazon EMR AMI version provides the following features:

Major Feature Updates

Major Bug Fixes

  • Fixed an issue encountered when no log-uri value is specified at cluster creation.

  • Fixed version utility to accurately display Amazon Hadoop Distribution version.

  • Fixed Hadoop to accept HADOOP_NAMENODE_HEAPSIZE and HADOOP_DATANODE_HEAPSIZE memory setting values.

  • Replaced YARN_HEAPSIZE with YARN_RESOURCEMANAGER_HEAPSIZE. YARN_NODEMANAGER_HEAPSIZE, and YARN_PROXYSERVER_HEAPSIZE to allow more granularity when configuring. For more information, see Configuration of hadoop-user-env.sh.

  • Added memory setting, HADOOP_JOB_HISTORYSERVER_HEAPSIZE.

  • Fixed an issue encountered with hdfs -get when used with an Amazon S3 path.

  • Fixed an issue with the HTTPFS service for Hadoop.

  • Fixed an issue that caused job failures after a previous job was killed.

  • Other improvements and bug fixes.

15 May 2014
3.0.4

This Amazon EMR AMI version provides the following features:

  • Adds a connector for Amazon Kinesis, which allows users to process streaming data using standard Hadoop and ecosystem tools within Amazon EMR clusters. For more information, see Analyze Amazon Kinesis Data.

  • Fixes an issue in the yarn-site.xml configuration file, which resulted in the JobHistory server not being fully configured.

  • Adds support for AWS SDK 1.7.0.

19 February 2014
3.0.3

This Amazon EMR AMI version provides the following features:

  • Adds support for AWS SDK 1.6.10.

  • Upgrades HttpClient to version 4.2 to be compatible with AWS SDK 1.6.10.

  • Fixes a problem related to orphaned Amazon EBS volumes.

  • Adds support for Hive 0.11.0.2.

  • Upgrades Protobuf to version 2.5.

    Note

    The upgrade to Protobuf 2.5 requires you to regenerate and recompile any of your Java code that was previously generated by the protoc tool.

11 February 2014
3.0.2

This Amazon EMR AMI version provides the following features:

  • Adds support for Impala 1.2.1 with Hadoop 2. For more information, see Impala.

  • Changes the uploadMultiParts function to use a retry policy.

12 December 2013
3.0.1

This Amazon EMR AMI version provides the following features:

  • Adds support for viewing Hadoop 2 task attempt logs in the EMR console.

  • Fixes an issue with R 3.0.1.

8 November 2013
3.0.0

This new major Amazon EMR AMI version provides the following features:

28 October 2013

Deprecated Hadoop 1 and Earlier AMIs

AMI VersionDescriptionRelease Date
2.4.11

This Amazon EMR AMI version provides the following:

  • Minor performance-related bug fixes.

26 February 2015
2.4.10

This Amazon EMR AMI version provides the following bug fixes:

  • Fixes an issue that prevented logs being properly handled due to corrupted files or improper permissions.

  • Fixes an issue that may have caused a step to not complete properly.

13 February 2015
2.4.9

This Amazon EMR AMI version provides the following bug fixes:

  • Includes the patch for bash issues: CVE-2014-6271 and CVE-2014-7169.

  • Backports Hadoop patch MAPREDUCE-5877.

  • Fixes an issue with JobTracker where a successful fetch from a local reducer prevents a bad TaskTracker node from being excluded.

31 October 2014
2.4.8

This Amazon EMR AMI version provides the following feature and bug fixes:

Major Feature Update

  • Allows unlimited steps over the lifetime of the cluster with up to 256 ACTIVE or PENDING steps at a given time and display of up to 1,000 step records (including system steps). For more information, see: Submit Work to a Cluster.

Major Bug Fixes

  • Fixes an issue with hbase-user-env.sh, which resulted in Hbase ignoring any settings made by this script.

  • Fixes an issue with s3distcp where using the .* regular expression causes an error.

16 September 2014
2.4.7

In addition to other enhancements and bug fixes, this version of Amazon EMR AMI corrects the following problems:

  • Fixes an issue with logs stored in /mnt/var/log, which may consume all of the volume's disk space.

  • Fixes a deadlock issue encountered when adding steps.

30 July 2014
2.4.6

This Amazon EMR AMI version provides the following features:

  • Adds support for Cascading 2.5.

  • Adds support for new instance types.

  • Fixed a permissions issue with Hadoop.

  • Various other bug fixes and enhancements.

15 May 2014
2.4.5

This Amazon EMR AMI version provides the following features:

  • Adds support for HVM AMIs in US East (N. Virginia), US West (Oregon), US West (N. California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and South America (São Paulo) regions.

  • Adds support for AWS SDK 1.7.0

  • Adds support for Python 2.7.

  • Adds support for Hive 0.11.0.2.

  • Upgrades Protobuf to version 2.5.

    Note

    The upgrade to Protobuf 2.5 requires you to regenerate and recompile any of your Java code that was previously generated by the protoc tool.

  • Updates to Java version to 7u60 (early access release). For more information, go to JDK 7 Update 60 Early Access Release.

  • Updates Jetty to version 6.1.26.emr.1 that fixes Hadoop MapReduce issue MAPREDUCE-2980.

  • Fixes an issue encountered when no log-uri is specified at cluster creation.

  • Fixes version utility to accurately display Amazon Hadoop Distribution version.

  • Other improvements and bug fixes.

27 March 2014
2.4.3

This Amazon EMR AMI version provides the following features:

  • Adds support for Python 2.7.

  • Updates Jetty to version 6.1.26.emr.1 that fixes the Hadoop MapReduce issue MAPREDUCE-2980.

  • Updates to Java version to 7u60 (early access release). For more information, go to JDK 7 Update 60 Early Access Release.

  • Adds support for Hive 0.11.0.2.

  • Upgrades Protobuf to version 2.5.

    Note

    The upgrade to Protobuf 2.5 requires you to regenerate and recompile any of your Java code that was previously generated by the protoc tool.

3 January 2014
2.4.2

Same as the previous AMI version, with the following additions:

  • Fixed a bug in host resolution that limited map-side local data optimization. Customers who use Fair Scheduler may observe a change in job execution due to the emphasis the system puts on data locality. The schedule may now hold back tasks to run them locally.

  • Includes Hadoop 1.0.3, Java 1.7, Perl 5.10.1, Python 2.6.6, and R 2.11

7 October 2013
2.4.1

Same as the previous AMI version, with the following additions:

  • Fixes a bug that causes the HBase shell not to work properly.

  • Fixes a bug that causes some clusters to fail with the error ‘concurrent modifications exception’.

  • Adds new logic in the instance controller to detect and reboot instances that have been blacklisted by Hadoop for an extended period of time.

  • Includes Hadoop 1.0.3, Java 1.7, Perl 5.10.1, Python 2.6.6, and R 2.11

20 August 2013
2.4

Same as the previous AMI version, with the following additions:

  • Adds support for Java 7 with Hadoop and HBase. Other Amazon EMR features, such as Hive and Pig, continue to require Java 6.

  • Improved JobTracker detection and response time when reducers become stuck due to a problematic mapper.

  • Fixes a problem that some Hadoop reducers are unable to fetch map output data due to a bad mapper, causing job delays.

  • Adds FetchStatusMap to keep track of all fetch errors and success along with their time stamp.

  • Fixes a problem with "Text File Busy" errors when launching tasks. For more information, go to MAPREDUCE-2374.

1 August 2013
2.3.6

Same as 2.3.5, with the following additions:

  • Fixes a problem in the Debian sources.lst and preferences files that caused certain bootstrap actions to fail, including Ganglia. Customers using AMI versions 2.0.0 to 2.3.5 may notice an additional bootstrap action in their list named EMR Debian Patch.

17 May 2013
2.3.5

Same as 2.3.3, with the following additions:

  • Fixes an S3DistCp bug which created invalid manifest file entries for certain URL encoded file names.

  • Improves log pushing functionality and adds a 7 day retention policy for on-cluster log files. Log files not modified for 7 or more days are deleted from the cluster.

  • Adds a streaming configuration option for not emitting the mapper key. For more information, go to MAPREDUCE-1785.

  • Adds the --s3ServerSideEncryption option to the S3DistCp tool. For more information, see S3DistCp Options.

26 April 2013
2.3.4

Note

Because of an issue with AMI 2.3.4, this version is deprecated. We recommend that you use a different AMI version instead.

16 April 2013
2.3.3

Same as 2.3.2, with the following additions:

  • Improved CloudWatch LiveTaskTracker metric to take into account expired Hadoop TaskTrackers and minor improvements in Hadoop.

01 March 2013
2.3.2

Same as 2.3.1, with the following additions:

  • Fixes an issue which prevented customers from using the debugging feature in the Amazon EMR console.

07 February 2013
2.3.1

Same as 2.3.0, with the following additions:

  • Improves support for clusters running on hs1.8xlarge instances.

24 December 2012
2.3.0

Same as 2.2.4, with the following additions:

20 December 2012
2.2.4

Same as 2.2.3, with the following additions:

  • Improves error handling in the Snappy decompressor. For more information, go to HADOOP-8151.

  • Fixes an issue with MapFile.Reader reading LZO or Snappy compressed files. For more information, go to HADOOP-8423.

  • Updates the kernel to the AWS version of 3.2.30-49.59.

6 December 2012
2.2.3

Same as 2.2.1, with the following additions:

  • Improves HBase backup functionality.

  • Updates the AWS SDK for Java to version 1.3.23.

  • Resolves issues with the job tracker user interface.

  • Improves Amazon S3 file system handling in Hadoop.

  • Improves to NameNode functionality in Hadoop.

30 November 2012
2.2.2

Note

Because of an issue with AMI 2.2.2, this version is deprecated. We recommend that you use a different AMI version instead.

23 November 2012
2.2.1

Same as 2.2.0, with the following additions:

  • Fixes an issue with HBase backup functionality.

  • Enables multipart upload by default for files larger than the Amazon S3 block size specified by fs.s3n.blockSize. For more information, see Configure Multipart Upload for Amazon S3.

30 August 2012
2.2.0

Same as 2.1.3, with the following additions:

  • Adds support for Hadoop 1.0.3.

  • No longer includes Hadoop 0.18 and Hadoop 0.20.205.

Operating system: Debian 6.0.5 (Squeeze)

Applications: Hadoop 1.0.3, Hive 0.8.1.3, Pig 0.9.2.2, HBase 0.92.0

Languages: Perl 5.10.1, PHP 5.3.3, Python 2.6.6, R 2.11.1, Ruby 1.8.7

File system: ext3 for root, xfs for ephemeral

6 August 2012
2.1.4

Same as 2.1.3, with the following additions:

30 August 2012
2.1.3

Same as 2.1.2, with the following additions:

  • Fixes issues in HBase.

6 August 2012
2.1.2

Same as 2.1.1, with the following additions:

  • Support for CloudWatch metrics when using MapR.

    Improve reliability of reporting metrics to CloudWatch.

6 August 2012
2.1.1

Same as 2.1.0, with the following additions:

  • Improves the reliability of log pushing.

  • Adds support for HBase in Amazon VPC.

  • Improves DNS retry functionality.

3 July 2012
2.1.0

Same as AMI 2.0.5, with the following additions:

  • Supports launching HBase clusters. For more information see Store Data with HBase (EMR 3.x Releases) .

  • Supports running MapR Editon M3 and Edition M5. For more information, see Using the MapR Distribution for Hadoop.

  • Enables HDFS append by default; dfs.support.append is set to true in hdfs/hdfs-default.xml. The default value in code is also set to true.

  • Fixes a race condition in instance controller.

  • Changes mapreduce.user.classpath.first to default to true. This configuration setting indicates whether to load classes first from the cluster's JAR file or the Hadoop system lib directory. This change was made to provide a way for you to easily override classes in Hadoop.

  • Uses Debian 6.0.5 (Squeeze) as the operating system.

12 June 2012
2.0.5

Note

Because of an issue with AMI 2.0.5, this version is deprecated. We recommend that you use a different AMI version instead.

Same as AMI 2.0.4, with the following additions:

  • Improves Hadoop performance by reinitializing the recycled compressor object for mappers only if they are configured to use the GZip compression codec for output.

  • Adds a configuration variable to Hadoop called mapreduce.jobtracker.system.dir.permission that can be used to set permissions on the system directory. For more information, see Setting Permissions on the System Directory.

  • Changes InstanceController to use an embedded database rather than the MySQL instance running on the box. MySQL remains installed and running by default.

  • Improves the collectd configuration. For more information about collectd, go to http://collectd.org/.

  • Fixes a rare race condition in InstanceController.

  • Changes the default shell from dash to bash.

  • Uses Debian 6.0.4 (Squeeze) as the operating system.

19 April 2012
2.0.4

Same as AMI 2.0.3, with the following additions:

  • Changes the default for fs.s3n.blockSize to 33554432 (32MiB).

  • Fixes a bug in reading zero-length files from Amazon S3.

30 January 2012
2.0.3

Same as AMI 2.0.2, with the following additions:

  • Adds support for Amazon EMR metrics in CloudWatch.

  • Improves performance of seek operations in Amazon S3.

24 January 2012
2.0.2

Same as AMI 2.0.1, with the following additions:

  • Adds support for the Python API Dumbo. For more information about Dumbo, go to https://github.com/klbostee/dumbo/wiki/.

  • The AMI now runs the Network Time Protocol Daemon (NTPD) by default. For more information about NTPD, go to http://en.wikipedia.org/wiki/Ntpd.

  • Updates the Amazon Web Services SDK to version 1.2.16.

  • Improves the way Amazon S3 file system initialization checks for the existence of Amazon S3 buckets.

  • Adds support for configuring the Amazon S3 block size to facilitate splitting files in Amazon S3. You set this in the fs.s3n.blockSize parameter. You set this parameter by using the configure-hadoop bootstrap action. The default value is 9223372036854775807 (8 EiB).

  • Adds a /dev/sd symlink for each /dev/xvd device. For example, /dev/xvdb now has a symlink pointing to it called /dev/sdb. Now you can use the same device names for AMI 1.0 and 2.0.

17 January 2012
2.0.1

Same as AMI 2.0 except for the following bug fixes:

  • Task attempt logs are pushed to Amazon S3.

  • Fixed /mnt mounting on 32-bit AMIs.

  • Uses Debian 6.0.3 (Squeeze) as the operating system.

19 December 2011
2.0.0

Operating system: Debian 6.0.2 (Squeeze)

Applications: Hadoop 0.20.205, Hive 0.7.1, Pig 0.9.1

Languages: Perl 5.10.1, PHP 5.3.3, Python 2.6.6, R 2.11.1, Ruby 1.8.7

File system: ext3 for root, xfs for ephemeral

Note: Added support for the Snappy compression/decompression library.

11 December 2011
1.0.1

Same as AMI 1.0 except for the following change:

  • Updates sources.list to the new location of the Lenny distribution in archive.debian.org.

3 April 2012
1.0.0

Operating system: Debian 5.0 (Lenny)

Applications: Hadoop 0.20 and 0.18 (default); Hive 0.5, 0.7 (default), 0.7.1; Pig 0.3 (on Hadoop 0.18), 0.6 (on Hadoop 0.20)

Languages: Perl 5.10.0, PHP 5.2.6, Python 2.5.2, R 2.7.1, Ruby 1.8.7

File system: ext3 for root and ephemeral

Kernel: Red Hat

Note: This was the last AMI released before the CLI was updated to support AMI versioning. For backward compatibility, job flows launched with versions of the CLI downloaded before 11 December 2011 use this version.

26 April 2011