Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« Previous
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Document History

The following table describes the important changes to the documentation since the last release of Amazon Elastic MapReduce (Amazon EMR).

API version: 2009-03-31.

Latest documentation update: May 9, 2013.

ChangeDescriptionRelease Date
Hive 0.8.1.7

Amazon Elastic MapReduce supports Hive 0.8.1.7 For more information, go to Supported Hive Versions.

May 2, 2013
Improved documentation organization, new table of contents, and new topics

Updated documentation organization with a restructured table of contents and many new topics for better ease of use and to accommodate customer feedback.

April 29, 2013
AMI 2.3.5

Amazon Elastic MapReduce supports AMI 2.3.5. For more information, go to AMI Versions Supported in Amazon EMR.

April 26, 2013
M1 Medium Amazon EC2 Instances

Amazon Elastic MapReduce supports m1.medium instances. For more information, go to Hadoop Default Configuration (AMI 2.3).

April 18, 2013
MapR 2.1.2

Amazon Elastic MapReduce supports MapR 2.1.2. For more information, go to Using the MapR Distribution for Hadoop.

April 18, 2013
AMI 2.3.4

Deprecated

April 16, 2013
AWS GovCloud (US) Region

Adds support for the AWS GovCloud (US) Region. For more information, see AWS GovCloud (US).

April 9, 2013
Supported Product User Arguments

Improved support for launching job flows on third-party applications with a new --supported-product CLI parameter that accepts custom user arguments. For more information, see Launch an Amazon EMR Cluster with MapR.

March 19, 2013
Amazon Virtual Private Cloud

Amazon Elastic MapReduce supports two platforms on which you can launch the EC2 instances of your job flow: EC2-Classic and EC2-VPC. For more information, go to Amazon VPC.

March 11, 2013
AMI 2.3.3

Amazon Elastic MapReduce supports AMI 2.3.3. For more information, go to AMI Versions Supported in Amazon EMR.

March 1, 2013
High I/O Instances

Amazon Elastic MapReduce supports hi1.4xlarge instances. For more information, go to Hadoop Default Configuration (AMI 2.3).

February 14, 2013
AMI 2.3.2

Amazon Elastic MapReduce supports AMI 2.3.2. For more information, go to AMI Versions Supported in Amazon EMR.

February 7, 2013
New introduction and tutorial

Added sections that describe Amazon EMR and a tutorial that walks you through your first streaming cluster. For more information, see What is Amazon EMR? and Get Started: Count Words with Amazon EMR

January 9, 2013
CLI Reference

Added CLI reference. For more information, see Command Line Interface Reference for Amazon EMR

January 8, 2013
AMI 2.3.1 Amazon Elastic MapReduce supports AMI 2.3.1. For more information, go to AMI Versions Supported in Amazon EMR. December 24, 2012
High Storage Instances

Amazon Elastic MapReduce supports hs1.8xlarge instances. For more information, go to Hadoop Default Configuration (AMI 2.3).

December 20, 2012
IAM Roles

Amazon Elastic MapReduce supports IAM Roles For more information, go to Configure IAM Roles for Amazon EMR.

December 20, 2012
Hive 0.8.1.6

Amazon Elastic MapReduce supports Hive 0.8.1.6 For more information, go to Supported Hive Versions.

December 20, 2012
AMI 2.3.0

Amazon Elastic MapReduce supports AMI 2.3.0. For more information, go to AMI Versions Supported in Amazon EMR.

December 20, 2012
AMI 2.2.4

Amazon Elastic MapReduce supports AMI 2.2.4 For more information, go to AMI Versions Supported in Amazon EMR.

December 6, 2012
AMI 2.2.3

Amazon Elastic MapReduce supports AMI 2.2.3 For more information, go to AMI Versions Supported in Amazon EMR.

November 30, 2012
Hive 0.8.1.5

Amazon Elastic MapReduce supports Hive 0.8.1.5. For more information, go to Analyze Data with Hive.

November 30, 2012
Asia Pacific (Sydney) Region

Adds support for Amazon EMR in the Asia Pacific (Sydney) Region.

November 12, 2012
Visible To All IAM UsersAdded support making a cluster visible to all IAM users on an AWS account. For more information, see Configure IAM User Permissions.October 1, 2012
Hive 0.8.1.4

Updates the HBase client on Hive clusters to version 0.92.0 to match the version of HBase used on HBase clusters. This fixes issues that occurred when connecting to an HBase cluster from a Hive cluster.

September 17, 2012
AMI 2.2.1
  • Fixes an issue with HBase backup functionality.

  • Enables multipart upload by default for files larger than the Amazon S3 block size specified by fs.s3n.blockSize. For more information, see Configure Multipart Upload for Amazon S3.

August 30, 2012
AMI 2.1.4
August 30, 2012
Hadoop 1.0.3, AMI 2.2.0, Hive 0.8.1.3, Pig 0.9.2.2 Support for Hadoop 1.0.3. For more information see Supported Hadoop Versions. August 6, 2012
AMI 2.1.3 Fixes issues with HBase. August 6, 2012
AMI 2.1.2 Support for Amazon CloudWatch metrics when using MapR. August 6, 2012
AMI 2.1.1 Improves the reliability of log pushing, adds support for HBase in Amazon VPC, and improves DNS retry functionality. July 9, 2012
Major-Minor AMI Versioning Improves AMI versioning by adding support for major-minor releases. Now you can specify the major-minor version for the AMI and always have the latest patches applied. For more information, see Choose a Machine Image . July 9, 2012
Hive 0.8.1.2 Fixes an issue with duplicate data in large clusters. July 9, 2012
S3DistCp 1.0.5 Provides better support for specifying the version of S3DistCp to use. June 27, 2012
Store Data with HBase Amazon EMR supports HBase, an open source, non-relational, distributed database modeled after Google's BigTable. For more information, see Store Data with HBase. June 12, 2012
Launch a Cluster on the MapR Distribution for Hadoop Amazon EMR supports MapR, an open, enterprise-grade distribution that makes Hadoop easier and more dependable. For more information, see Using the MapR Distribution for Hadoop. June 12, 2012
Connect to the Master Node in an Amazon EMR Cluster Added information about how to connect to the master node using both SSH and a SOCKS proxy. For more information, see Connect to the Cluster. June 12, 2012
Hive 0.8.1 Amazon Elastic MapReduce supports Hive 0.8.1. For more information, go to Analyze Data with Hive. May 30, 2012
HParser Added information about running Informatica HParser on Amazon EMR. For more information, see Parse Data with HParser. April 30, 2012
AMI 2.0.5 Enhancements to performance and other updates. For details, see AMI Versions Supported in Amazon EMR. April 19, 2012
Pig 0.9.2 Amazon Elastic MapReduce supports Pig 0.9.2. Pig 0.9.2 adds support for user-defined functions written in Python and other improvements. For more information, go to Pig Version Details. April 9, 2012
Pig versioning Amazon Elastic MapReduce supports the ability to specify the Pig version when launching a cluster. For more information, go to Process Data with Pig. April 9, 2012
Hive 0.7.1.4 Amazon Elastic MapReduce supports Hive 0.7.1.4. For more information, go to Analyze Data with Hive. April 9, 2012
AMI 1.0.1 Updates sources.list to the new location of the Lenny distribution in archive.debian.org. April 3, 2012
Hive 0.7.1.3 Support for new version of Hive, version 0.7.1.3, which adds the dynamodb.retry.duration variable which you can use to configure the timeout duration for retrying Hive queries. This version of Hive also supports setting the Amazon DynamoDB endpoint from within the Hive command-line application. March 13, 2012
Support for IAM in the console Support for AWS Identity and Access Management (IAM) in the Amazon EMR console. Improvements for S3DistCp and support for Hive 0.7.1.2 are also included. February 28, 2012
Support for CloudWatch Metrics Support for monitoring cluster metrics and setting alarms on metrics. January 31, 2012
Support for S3DistCp Support for distributed copy using S3DistCp. January 19, 2012
Support for Amazon DynamoDB Support for exporting and querying data stored in Amazon DynamoDB. January 18, 2012
AMI 2.0.2 and Hive 0.7.1.1 Support for Amazon EMR AMI 2.0.2 and Hive 0.7.1.1. January 17, 2012
Cluster Compute Eight Extra Large (cc2.8xlarge) Support for Cluster Compute Eight Extra Large (cc2.8xlarge) instances in clusters. December 21, 2011
Hadoop 0.20.205 Support for Hadoop 0.20.205. For more information see Supported Hadoop Versions. December 11, 2011
Pig 0.9.1Support for Pig 0.9.1. For more information see Supported Pig Versions. December 11, 2011
AMI versioning You can now specify which version of the Amazon EMR AMI to use to launch your cluster. All EC2 instances in the cluster will be initialized with the AMI version that you specify. For more information see Choose a Machine Image . December 11, 2011
Amazon EMR clusters on Amazon Virtual Private Cloud (Amazon VPC)You can now launch Amazon EMR clusters inside of your Amazon Virtual Private Cloud (Amazon VPC) for greater control over network configuration and access. For more information see Select a Amazon VPC Subnet for the Cluster (Optional). December 11, 2011
Spot InstancesSupport for launching cluster instance groups as Spot Instances added. For more information see Lower Costs with Spot Instances (Optional). August 19, 2011
Hive 0.7.1Support for Hive 0.7.1 added. For more information see Supported Hive Versions. July 25, 2011
Termination ProtectionSupport for a new Termination Protection feature. For more information see Protect a Cluster from Termination. April 14, 2011
TaggingSupport for Amazon EC2 tagging. For more information see View Cluster Instances in Amazon EC2.March 9, 2011
IAM IntegrationSupport for Amazon Identity and Access Management. For more information see Configure IAM User Permissions and Configure IAM User Permissions.February 21, 2011
Elastic IP SupportSupport for Elastic IP addresses. For more information see Associate an Elastic IP Address with a Cluster and Associate an Elastic IP Address with a Cluster. February 21, 2011
Environment ConfigurationExpanded sections on Environment Configuration and Performance Tuning. For more information see Create Bootstrap Actions to Install Additional Software (Optional). February 21, 2011
Distributed CacheFor more information about using DistributedCache to upload files and libraries, see Import files using Distributed Cache. February 21, 2011
How to build modules using Amazon Elastic MapReduce (Amazon EMR)For more information see Build Binaries Using Amazon EMR. February 21, 2011
Comparison of cluster typesFor more information see Choose the Type of Cluster to Run. February 21, 2011
Amazon S3 multipart uploadSupport of Amazon S3 multipart upload through the AWS Java SDK. For more information see Configure Multipart Upload for Amazon S3.January 6, 2010
Hive 0.70Support for Hive 0.70 and concurrent versions of Hive 0.5 and Hive 0.7 on same cluster. Note: You need to update the Elastic MapReduce Command Line Interface to resize running job flows and modify instance groups. For more information see Analyze Data with Hive.December 8, 2010
JDBC Drivers for HiveSupport for JDBC with Hive 0.5 and Hive 0.7. For more information see Use the Hive JDBC Driver. December 8, 2010
Support HPCSupport for Cluster Compute instances. For more information see Virtual Server Configurations.November 14, 2010
Bootstrap ActionsExpanded content and samples for bootstrap actions. For more information see Create Bootstrap Actions to Install Additional Software (Optional).November 14, 2010
Cascading clustersDescription of Cascading cluster support. For more information see Launch a Cascading Cluster and Process Data with a Cascading Cluster.November 14, 2010
Resize Running ClusterSupport for resizing a running cluster. New node types task and core replace slave node. For more information see What is Amazon EMR?, Resize a Running Cluster, and Resize a Running Cluster.October 19, 2010
Appendix: Configuration OptionsExpanded information on configuration options available in Amazon EMR. For more information, refer to Hadoop Configuration Reference.October 19, 2010
Guide revision

This release features a reorganization of the Amazon EMR Developer Guide.

October 19, 2010