Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

Supported Pig Versions

The Pig version you can add to your cluster depends on the version of the Amazon Elastic MapReduce (Amazon EMR) AMI and the version of Hadoop you are using. The table below shows which AMI versions and versions of Hadoop are compatible with the different versions of Pig. We recommend using the latest available version of Pig to take advantage of performance enhancements and new functionality. For more information about the Amazon EMR AMIs and AMI versioning, see Choose an Amazon Machine Image (AMI).

If you choose to install Pig on your cluster using the console or the AWS CLI, the AMI you specify determines the version of Pig installed. By default, Pig is installed on your cluster when you use the console, but you can remove it during cluster creation. Pig is also installed by default when you use the AWS CLI unless you use the --applications parameter to identify which applications you want on your cluster. The AWS CLI does not support Pig versioning.

When you use the API to install Pig, the default version is used unless you specify --pig-versions as an argument to the step that loads Pig onto the cluster during the call to RunJobFlow.

Pig VersionAMI VersionConfiguration ParametersPig Version Details
0.12.0

Release Notes

Documentation

3.1.0 and later

--ami-version 3.1

--ami-version 3.2

--ami-version 3.3

Adds support for the following:

  • Streaming UDFs without JVM implementations

  • ASSERT and IN operators

  • CASE expression

  • AvroStorage as a Pig built-in function.

  • ParquetLoader and ParquetStorer as built-in functions

  • BigInteger and BigDecimal types

0.11.1.1

Release Notes

Documentation

2.2 and later

--pig-versions 0.11.1.1

--ami-version 2.2

Improves performance of LOAD command with PigStorage if input resides in Amazon S3.

0.11.1

Release Notes

Documentation

2.2 and later

--pig-versions 0.11.1

--ami-version 2.2

Adds support for JDK 7, Hadoop 2, Groovy User Defined Functions, SchemaTuple optimization, new operators, and more. For more information, see Pig 0.11.1 Change Log.

0.9.2.2

Release Notes

Documentation

2.2 and later

--pig-versions 0.9.2.2

--ami-version 2.2

Adds support for Hadoop 1.0.3.

0.9.2.1

Release Notes

Documentation

2.2 and later

--pig-versions 0.9.2.1

--ami-version 2.2

Adds support for MapR. For more information, see Using the MapR Distribution for Hadoop.

0.9.2

Release Notes

Documentation

2.2 and later

--pig-versions 0.9.2

--ami-version 2.2

Includes several performance improvements and bug fixes. For complete information about the changes for Pig 0.9.2, go to the Pig 0.9.2 Change Log.

0.9.1

Release Notes

Documentation

2.0

--pig-versions 0.9.1

--ami-version 2.0

 
0.6

Release Notes

1.0

--pig-versions 0.6

--ami-version 1.0

 
0.3

Release Notes

1.0

--pig-versions 0.3

--ami-version 1.0

 

Pig Version Details

Amazon EMR supports certain Pig releases that might have additional Amazon EMR patches applied. You can configure which version of Pig to run on Amazon Elastic MapReduce (Amazon EMR) clusters. For more information about how to do this, see Pig and Amazon EMR. The following sections describe different Pig versions and the patches applied to the versions loaded on Amazon EMR.

Pig Patches

This section describes the custom patches applied to Pig versions available with Amazon EMR.

Pig 0.11.1.1 Patches

The Amazon EMR version of Pig 0.11.1.1 is a maintenance release that improves performance of LOAD command with PigStorage if the input resides in Amazon S3.

Pig 0.11.1 Patches

The Amazon EMR version of Pig 0.11.1 contains all the updates provided by the Apache Software Foundation and the cumulative Amazon EMR patches from Pig version 0.9.2.2. However, there are no new Amazon EMR-specific patches in Pig 0.11.1.

Pig 0.9.2 Patches

Apache Pig 0.9.2 is a maintenance release of Pig. The Amazon EMR team has applied the following patches to the Amazon EMR version of Pig 0.9.2.

PatchDescription
PIG-1429

Add the Boolean data type to Pig as a first class data type. For more information, go to https://issues.apache.org/jira/browse/PIG-1429.

Status: Committed

Fixed in Apache Pig Version: 0.10

PIG-1824

Support import modules in Jython UDF. For more information, go to https://issues.apache.org/jira/browse/PIG-1824.

Status: Committed

Fixed in Apache Pig Version: 0.10

PIG-2010

Bundle registered JARs on the distributed cache. For more information, go to https://issues.apache.org/jira/browse/PIG-2010.

Status: Committed

Fixed in Apache Pig Version: 0.11

PIG-2456

Add a ~/.pigbootup file where the user can specify default Pig statements. For more information, go to https://issues.apache.org/jira/browse/PIG-2456.

Status: Committed

Fixed in Apache Pig Version: 0.11

PIG-2623

Support using Amazon S3 paths to register UDFs. For more information, go to https://issues.apache.org/jira/browse/PIG-2623.

Status: Committed

Fixed in Apache Pig Version: 0.10, 0.11

Pig 0.9.1 Patches

The Amazon EMR team has applied the following patches to the Amazon EMR version of Pig 0.9.1.

PatchDescription
Support JAR files and Pig scripts in dfs

Add support for running scripts and registering JAR files stored in HDFS, Amazon S3, or other distributed file systems. For more information, go to https://issues.apache.org/jira/browse/PIG-1505.

Status: Committed

Fixed in Apache Pig Version: 0.8.0

Support multiple file systems in Pig

Add support for Pig scripts to read data from one file system and write it to another. For more information, go to https://issues.apache.org/jira/browse/PIG-1564.

Status: Not Committed

Fixed in Apache Pig Version: n/a

Add Piggybank datetime and string UDFs

Add datetime and string UDFs to support custom Pig scripts. For more information, go to https://issues.apache.org/jira/browse/PIG-1565.

Status: Not Committed

Fixed in Apache Pig Version: n/a

Additional Pig Functions

The Amazon EMR development team has created additional Pig functions that simplify string manipulation and make it easier to format date-time information. These are available at http://aws.amazon.com/code/2730.