Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« Previous
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Document History

The following table describes the important changes to the documentation since the last release of Amazon Elastic MapReduce (Amazon EMR).

API version: 2009-03-31

Latest documentation update: March 27, 2014

ChangeDescriptionRelease Date
AMI 2.4.5

Amazon EMR supports AMI 2.4.5. For more information, see AMI Versions Supported in Amazon EMR.

March 27, 2014
Elastic Load Balancing Access Logs with Amazon EMRAdded a tutorial for processing access logs produced by Elastic Load Balancing. For more information, see Analyze Elastic Load Balancing Log Data.March 6, 2014
AMI 3.0.4

Amazon EMR supports AMI 3.0.4 and a connector for Amazon Kinesis . For more information, see AMI Versions Supported in Amazon EMR.

February 20, 2014
AMI 3.0.3

Amazon EMR supports AMI 3.0.3. For more information, see AMI Versions Supported in Amazon EMR.

February 11, 2014
Hive 0.11.0.2

Amazon EMR supports Hive 0.11.0.2. For more information, see Supported Hive Versions.

February 11, 2014
Impala 1.2.1

Amazon EMR supports Impala 1.2.1 with Hadoop 2. For more information, see Analyze Data with Impala.

December 12, 2013
AMI 3.0.2

Amazon EMR supports AMI 3.0.2. For more information, see AMI Versions Supported in Amazon EMR.

December 12, 2013
Amazon EMR tags

Amazon EMR supports tagging on Amazon EMR clusters. For more information, see Tagging Amazon EMR Clusters.

December 5, 2013
CLI version 2013-12-02 Adds support for Amazon EMR tags. For more information, see Command Line Interface Releases. December 5, 2013
AMI 3.0.1

Amazon EMR supports AMI 3.0.1. For more information, see AMI Versions Supported in Amazon EMR.

November 8, 2013
New Amazon EMR consoleA new management console is available for Amazon EMR. The new console is much faster and has powerful new features, including:
  • Resizing a running cluster (that is, adding or removing instances)

  • Cloning the launch configurations for running or terminated clusters

  • Hadoop 2 support, including custom Amazon CloudWatch metrics

  • Targeting specific Availability Zones

  • Creating clusters with IAM roles

  • Submitting multiple steps (before and after cluster creation)

  • New console help portal with integrated documentation search

November 6, 2013
MapR 3.0.2

Amazon EMR supports MapR 3.0.2. For more information, see Using the MapR Distribution for Hadoop.

November 6, 2013
Hadoop 2.2.0Amazon EMR supports Hadoop 2.2.0. For more information, see Hadoop 2.2.0 New Features. October 29, 2013
AMI 3.0.0

Amazon EMR supports AMI 3.0.0. For more information, see AMI Versions Supported in Amazon EMR.

October 29, 2013
CLI version 2013-10-07 Maintenance update for the Amazon EMR CLI. For more information, see Command Line Interface Releases. October 7, 2013
AMI 2.4.2

Amazon EMR supports AMI 2.4.2 For more information, see AMI Versions Supported in Amazon EMR.

October 7, 2013
AMI 2.4.1

Amazon EMR supports AMI 2.4.1 For more information, see AMI Versions Supported in Amazon EMR.

August 20, 2013
Hive 0.11.0.1

Amazon EMR supports Hive 0.11.0.1. For more information, see Supported Hive Versions.

August 2, 2013
Hive 0.11.0

Amazon EMR supports Hive 0.11.0. For more information, see Supported Hive Versions.

August 1, 2013
Pig 0.11.1.1Amazon EMR supports Pig 0.11.1.1. For more information, see Supported Pig Versions. August 1, 2013
AMI 2.4

Amazon EMR supports AMI 2.4. For more information, see AMI Versions Supported in Amazon EMR.

August 1, 2013
MapR 2.1.3

Amazon EMRsupports MapR 2.1.3. For more information, see Using the MapR Distribution for Hadoop.

August 1, 2013
MapR M7 Edition

Amazon EMR supports MapR M7 Edition. For more information, see Using the MapR Distribution for Hadoop.

July 11, 2013
CLI version 2013-07-08 Maintenance update to the Amazon EMR CLI version 2013-07-08. For more information, see Command Line Interface Releases. July 11, 2013
Pig 0.11.1 Amazon EMR supports Pig 0.11.1. Pig 0.11.1 adds support for JDK 7, Hadoop 2, and more. For more information, see Supported Pig Versions. July 1, 2013
Hive 0.8.1.8

Amazon EMR supports Hive 0.8.1.8. For more information, see Supported Hive Versions.

June 18, 2013
AMI 2.3.6

Amazon EMR supports AMI 2.3.6. For more information, see AMI Versions Supported in Amazon EMR.

May 17, 2013
Hive 0.8.1.7

Amazon EMR supports Hive 0.8.1.7. For more information, see Supported Hive Versions.

May 2, 2013
Improved documentation organization, new table of contents, and new topics

Updated documentation organization with a restructured table of contents and many new topics for better ease of use and to accommodate customer feedback.

April 29, 2013
AMI 2.3.5

Amazon EMR supports AMI 2.3.5. For more information, see AMI Versions Supported in Amazon EMR.

April 26, 2013
M1 Medium Amazon EC2 Instances

Amazon EMR supports m1.medium instances. For more information, see Hadoop 2.2.0 Default Configuration.

April 18, 2013
MapR 2.1.2

Amazon Elastic MapReduce supports MapR 2.1.2. For more information, see Using the MapR Distribution for Hadoop.

April 18, 2013
AMI 2.3.4

Deprecated.

April 16, 2013
AWS GovCloud (US)

Adds support for AWS GovCloud (US). For more information, see AWS GovCloud (US).

April 9, 2013
Supported Product User Arguments

Improved support for launching job flows on third-party applications with a new --supported-product CLI option that accepts custom user arguments. For more information, see Launch an Amazon EMR cluster with MapR using the console.

March 19, 2013
Amazon VPC

Amazon Elastic MapReduce supports two platforms on which you can launch the EC2 instances of your job flow: EC2-Classic and EC2-VPC. For more information, see Amazon VPC.

March 11, 2013
AMI 2.3.3

Amazon Elastic MapReduce supports AMI 2.3.3. For more information, see AMI Versions Supported in Amazon EMR.

March 1, 2013
High I/O Instances

Amazon Elastic MapReduce supports hi1.4xlarge instances. For more information, see Hadoop 2.2.0 Default Configuration.

February 14, 2013
AMI 2.3.2

Amazon Elastic MapReduce supports AMI 2.3.2. For more information, see AMI Versions Supported in Amazon EMR.

February 7, 2013
New introduction and tutorial

Added sections that describe Amazon EMR and a tutorial that walks you through your first streaming cluster. For more information, see What is Amazon EMR? and Get Started: Count Words with Amazon EMR .

January 9, 2013
CLI Reference

Added CLI reference. For more information, see Command Line Interface Reference for Amazon EMR.

January 8, 2013
AMI 2.3.1 Amazon Elastic MapReduce supports AMI 2.3.1. For more information, see AMI Versions Supported in Amazon EMR. December 24, 2012
High Storage Instances

Amazon Elastic MapReduce supports hs1.8xlarge instances. For more information, see Hadoop 2.2.0 Default Configuration.

December 20, 2012
IAM Roles

Amazon Elastic MapReduce supports IAM Roles. For more information, see Configure IAM Roles for Amazon EMR.

December 20, 2012
Hive 0.8.1.6

Amazon Elastic MapReduce supports Hive 0.8.1.6. For more information, see Supported Hive Versions.

December 20, 2012
AMI 2.3.0

Amazon Elastic MapReduce supports AMI 2.3.0. For more information, see AMI Versions Supported in Amazon EMR.

December 20, 2012
AMI 2.2.4

Amazon Elastic MapReduce supports AMI 2.2.4. For more information, see AMI Versions Supported in Amazon EMR.

December 6, 2012
AMI 2.2.3

Amazon Elastic MapReduce supports AMI 2.2.3. For more information, see AMI Versions Supported in Amazon EMR.

November 30, 2012
Hive 0.8.1.5

Amazon Elastic MapReduce supports Hive 0.8.1.5. For more information, see Analyze Data with Hive.

November 30, 2012
Asia Pacific (Sydney) Region

Adds support for Amazon EMR in the Asia Pacific (Sydney) Region.

November 12, 2012
Visible To All IAM UsersAdded support making a cluster visible to all IAM users on an AWS account. For more information, see Configure IAM User Permissions.October 1, 2012
Hive 0.8.1.4

Updates the HBase client on Hive clusters to version 0.92.0 to match the version of HBase used on HBase clusters. This fixes issues that occurred when connecting to an HBase cluster from a Hive cluster.

September 17, 2012
AMI 2.2.1
  • Fixes an issue with HBase backup functionality.

  • Enables multipart upload by default for files larger than the Amazon S3 block size specified by fs.s3n.blockSize. For more information, see Configure Multipart Upload for Amazon S3.

August 30, 2012
AMI 2.1.4
August 30, 2012
Hadoop 1.0.3, AMI 2.2.0, Hive 0.8.1.3, Pig 0.9.2.2 Support for Hadoop 1.0.3. For more information, see Supported Hadoop Versions. August 6, 2012
AMI 2.1.3 Fixes issues with HBase. August 6, 2012
AMI 2.1.2 Support for Amazon CloudWatch metrics when using MapR. August 6, 2012
AMI 2.1.1 Improves the reliability of log pushing, adds support for HBase in Amazon VPC, and improves DNS retry functionality. July 9, 2012
Major-Minor AMI Versioning Improves AMI versioning by adding support for major-minor releases. Now you can specify the major-minor version for the AMI and always have the latest patches applied. For more information, see Choose a Machine Image. July 9, 2012
Hive 0.8.1.2 Fixes an issue with duplicate data in large clusters. July 9, 2012
S3DistCp 1.0.5 Provides better support for specifying the version of S3DistCp to use. June 27, 2012
Store Data with HBase Amazon EMR supports HBase, an open source, non-relational, distributed database modeled after Google's BigTable. For more information, see Store Data with HBase. June 12, 2012
Launch a Cluster on the MapR Distribution for Hadoop Amazon EMR supports MapR, an open, enterprise-grade distribution that makes Hadoop easier and more dependable. For more information, see Using the MapR Distribution for Hadoop. June 12, 2012
Connect to the Master Node in an Amazon EMR Cluster Added information about how to connect to the master node using both SSH and a SOCKS proxy. For more information, see Connect to the Cluster. June 12, 2012
Hive 0.8.1 Amazon Elastic MapReduce supports Hive 0.8.1. For more information, see Analyze Data with Hive. May 30, 2012
HParser Added information about running Informatica HParser on Amazon EMR. For more information, see Parse Data with HParser. April 30, 2012
AMI 2.0.5 Enhancements to performance and other updates. For more information, see AMI Versions Supported in Amazon EMR. April 19, 2012
Pig 0.9.2 Amazon Elastic MapReduce supports Pig 0.9.2. Pig 0.9.2 adds support for user-defined functions written in Python and other improvements. For more information, see Pig Version Details. April 9, 2012
Pig versioning Amazon Elastic MapReduce supports the ability to specify the Pig version when launching a cluster. For more information, see Process Data with Pig. April 9, 2012
Hive 0.7.1.4 Amazon Elastic MapReduce supports Hive 0.7.1.4. For more information, see Analyze Data with Hive. April 9, 2012
AMI 1.0.1 Updates sources.list to the new location of the Lenny distribution in archive.debian.org. April 3, 2012
Hive 0.7.1.3 Support for new version of Hive, version 0.7.1.3. This version adds the dynamodb.retry.duration variable which you can use to configure the timeout duration for retrying Hive queries. This version of Hive also supports setting the DynamoDB endpoint from within the Hive command-line application. March 13, 2012
Support for IAM in the console Support for AWS Identity and Access Management (IAM) in the Amazon EMR console. Improvements for S3DistCp and support for Hive 0.7.1.2 are also included. February 28, 2012
Support for CloudWatch Metrics Support for monitoring cluster metrics and setting alarms on metrics. January 31, 2012
Support for S3DistCp Support for distributed copy using S3DistCp. January 19, 2012
Support for DynamoDB Support for exporting and querying data stored in DynamoDB. January 18, 2012
AMI 2.0.2 and Hive 0.7.1.1 Support for Amazon EMR AMI 2.0.2 and Hive 0.7.1.1. January 17, 2012
Cluster Compute Eight Extra Large (cc2.8xlarge) Support for Cluster Compute Eight Extra Large (cc2.8xlarge) instances in clusters. December 21, 2011
Hadoop 0.20.205 Support for Hadoop 0.20.205. For more information, see Supported Hadoop Versions. December 11, 2011
Pig 0.9.1Support for Pig 0.9.1. For more information see Supported Pig Versions. December 11, 2011
AMI versioning You can now specify which version of the Amazon EMR AMI to use to launch your cluster. All EC2 instances in the cluster will be initialized with the AMI version that you specify. For more information, see Choose a Machine Image. December 11, 2011
Amazon EMR clusters on Amazon VPCYou can now launch Amazon EMR clusters inside of your Amazon Virtual Private Cloud (Amazon VPC) for greater control over network configuration and access. For more information, see Select a Amazon VPC Subnet for the Cluster (Optional). December 11, 2011
Spot InstancesSupport for launching cluster instance groups as Spot Instances added. For more information, see Lower Costs with Spot Instances (Optional). August 19, 2011
Hive 0.7.1Support for Hive 0.7.1 added. For more information, see Supported Hive Versions. July 25, 2011
Termination ProtectionSupport for a new Termination Protection feature. For more information, see Protect a Cluster from Termination. April 14, 2011
TaggingSupport for Amazon EC2 tagging. For more information, see View Cluster Instances in Amazon EC2.March 9, 2011
IAM IntegrationSupport for AWS Identity and Access Management. For more information, see Configure IAM User Permissions and Configure IAM User Permissions.February 21, 2011
Elastic IP SupportSupport for Elastic IP addresses. For more information, see Associate an Elastic IP Address with a Cluster and Associate an Elastic IP Address with a Cluster. February 21, 2011
Environment ConfigurationExpanded sections on Environment Configuration and Performance Tuning. For more information, see Create Bootstrap Actions to Install Additional Software (Optional). February 21, 2011
Distributed CacheFor more information about using DistributedCache to upload files and libraries, see Import files using Distributed Cache. February 21, 2011
How to build modules using Amazon EMRFor more information, see Build Binaries Using Amazon EMR. February 21, 2011
Comparison of cluster typesFor more information, see Choose the Type of Cluster to Run. February 21, 2011
Amazon S3 multipart uploadSupport of Amazon S3 multipart upload through the AWS SDK for Java. For more information, see Configure Multipart Upload for Amazon S3.January 6, 2010
Hive 0.70Support for Hive 0.70 and concurrent versions of Hive 0.5 and Hive 0.7 on same cluster. Note: You need to update the Amazon EMR command line interface to resize running job flows and modify instance groups. For more information, see Analyze Data with Hive.December 8, 2010
JDBC Drivers for HiveSupport for JDBC with Hive 0.5 and Hive 0.7. For more information, see Use the Hive JDBC Driver. December 8, 2010
Support HPCSupport for cluster compute instances. For more information, see Virtual Server Configurations.November 14, 2010
Bootstrap ActionsExpanded content and samples for bootstrap actions. For more information, see Create Bootstrap Actions to Install Additional Software (Optional).November 14, 2010
Cascading clustersDescription of Cascading cluster support. For more information, see Launch a Cascading Cluster and Process Data with a Cascading Cluster.November 14, 2010
Resize Running ClusterSupport for resizing a running cluster. New node types task and core replace slave node. For more information, see What is Amazon EMR?, Resize a Running Cluster, and Resize a Running Cluster.October 19, 2010
Appendix: Configuration OptionsExpanded information on configuration options available in Amazon EMR. For more information, see Hadoop Configuration Reference.October 19, 2010
Guide revision

This release features a reorganization of the Amazon Elastic MapReduce Developer Guide.

October 19, 2010