Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Supported Pig Versions

The Pig version you can run depends on the version of the Amazon Elastic MapReduce (Amazon EMR) AMI and the version of Hadoop you are using. The table below shows which AMI versions and versions of Hadoop are compatible with the different versions of Pig. We recommend using the latest available version of Pig to take advantage of performance enhancements and new functionality. To select the configuration, use the --ami-version and --pig-versions parameters in the cluster creation call. For more information about the Amazon EMR AMIs and AMI versioning, see Choose a Machine Image .

Amazon EMR attempts to use the latest version of Pig if you do not specify a version number.

The Amazon EMR console does not support Pig versioning and always launches the latest version of Pig.

You can manually specify a Pig version (using the --pig-versions parameter) with the Amazon EMR CLI version 2013-07-08 or newer, available from http://aws.amazon.com/code/Elastic-MapReduce/2264.

When you call the API, you will launch the default configuration of Pig unless you specify --pig-versions as an argument to the step that loads Pig onto the cluster during the call to RunJobFlow.

Pig VersionAMI VersionConfiguration ParametersPig Version Details
0.11.1.12.2 and later

--pig-versions 0.11.1.1

--ami-version 2.2

Improves performance of LOAD command with PigStorage if input resides in Amazon S3.

0.11.12.2 and later

--pig-versions 0.11.1

--ami-version 2.2

Adds support for JDK 7, Hadoop 2, Groovy User Defined Functions, SchemaTuple optimization, new operators, and more. For more information, see Pig 0.11.1 Change Log.

0.9.2.22.2 and later

--pig-versions 0.9.2.2

--ami-version 2.2

Adds support for Hadoop 1.0.3.

0.9.2.12.2 and later

--pig-versions 0.9.2.1

--ami-version 2.2

Adds support for MapR. For more information, see Using the MapR Distribution for Hadoop.

0.9.22.2 and later

--pig-versions 0.9.2

--ami-version 2.2

Includes several performance improvements and bug fixes. For complete information about the changes for Pig 0.9.2, go to the Pig 0.9.2 Change Log.

0.9.12.0

--pig-versions 0.9.1

--ami-version 2.0

 
0.61.0

--pig-versions 0.6

--ami-version 1.0

 
0.31.0

--pig-versions 0.3

--ami-version 1.0

 

To specify the Pig version when creating the cluster

  • Use the --pig-versions parameter. The following command-line example creates an interactive Pig cluster running Hadoop 1.0.3 and Pig 0.11.1. In the following, instanceType would be replaced by an EC2 instance type such as m1.small.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "Test Pig" \
      --ami-version 2.3.6 \
      --num-instances 5 --instance-type instanceType \
      --pig-interactive \
      --pig-versions 0.11.1
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "Test Pig" --ami-version 2.3.6 --num-instances 5 --instance-type instanceType --pig-interactive --pig-versions 0.11.1

To specify the latest Pig version when creating the cluster

  • Use the --pig-versions parameter with the latest keyword. The following command-line example creates an interactive Pig cluster running the latest version of Pig. In the following, instanceType would be replaced by an EC2 instance type such as m1.small.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "Test Latest Pig" \
      --ami-version 2.2 \
      --num-instances 5 --instance-type instanceType \
      --pig-interactive \
      --pig-versions latest
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "Test Latest Pig" --ami-version 2.2 --num-instances 5 --instance-type instanceType --pig-interactive --pig-versions latest

To load multiple versions of Pig for a given cluster

  • Use the --pig-versions parameter and separate the version numbers by commas. The following command-line example creates an interactive Pig job flow running Hadoop 0.20.205 and Pig 0.9.1 and Pig 0.9.2. With this configuration, you can use either version of Pig on the cluster. In the following, instanceType would be replaced by an EC2 instance type such as m1.small.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "Test Pig" \
      --ami-version 2.0 \
      --num-instances 5 --instance-type instanceType \
      --pig-interactive \
      --pig-versions 0.9.1,0.9.2
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "Test Pig" --ami-version 2.0 --num-instances 5 --instance-type instanceType --pig-interactive --pig-versions 0.9.1,0.9.2

If you have multiple versions of Pig loaded on a cluster, calling Pig accesses the default version of Pig, or the version loaded last if there are multiple --pig-versions parameters specified in the cluster creation call. When the comma-separated syntax is used with --pig-versions to load multiple versions, Pig accesses the default version.

To call a specific version of Pig

  • Add the version number to the call. For example, pig-0.11.1 or pig-0.9.2. You would do this, for example, in an interactive Pig cluster by using SSH to connect to the master node and then running a command like the following from the terminal.

    pig-0.9.2
    	  		

Pig Version Details

Amazon EMR supports certain Pig releases that might have additional Amazon EMR patches applied. You can configure which version of Pig to run on Amazon Elastic MapReduce (Amazon EMR) clusters. For more information about how to do this, see Process Data with Pig. The following sections describe different Pig versions and the patches applied to the versions loaded on Amazon EMR.

Pig Patches

This section describes the custom patches applied to Pig versions available with Amazon EMR.

Pig 0.11.1.1 Patches

The Amazon EMR version of Pig 0.11.1.1 is a maintenance release that improves performance of LOAD command with PigStorage if the input resides in Amazon S3.

Pig 0.11.1 Patches

The Amazon EMR version of Pig 0.11.1 contains all the updates provided by the Apache Software Foundation and the cumulative Amazon EMR patches from Pig version 0.9.2.2. However, there are no new Amazon EMR-specific patches in Pig 0.11.1.

Pig 0.9.2 Patches

Apache Pig 0.9.2 is a maintenance release of Pig. The Amazon EMR team has applied the following patches to the Amazon EMR version of Pig 0.9.2.

PatchDescription
PIG-1429

Add the Boolean data type to Pig as a first class data type. For more information, go to https://issues.apache.org/jira/browse/PIG-1429.

Status: Committed

Fixed in Apache Pig Version: 0.10

PIG-1824

Support import modules in Jython UDF. For more information, go to https://issues.apache.org/jira/browse/PIG-1824.

Status: Committed

Fixed in Apache Pig Version: 0.10

PIG-2010

Bundle registered JARs on the distributed cache. For more information, go to https://issues.apache.org/jira/browse/PIG-2010.

Status: Committed

Fixed in Apache Pig Version: 0.11

PIG-2456

Add a ~/.pigbootup file where the user can specify default Pig statements. For more information, go to https://issues.apache.org/jira/browse/PIG-2456.

Status: Committed

Fixed in Apache Pig Version: 0.11

PIG-2623

Support using Amazon S3 paths to register UDFs. For more information, go to https://issues.apache.org/jira/browse/PIG-2623.

Status: Committed

Fixed in Apache Pig Version: 0.10, 0.11

Pig 0.9.1 Patches

The Amazon EMR team has applied the following patches to the Amazon EMR version of Pig 0.9.1.

PatchDescription
Support JAR files and Pig scripts in dfs

Add support for running scripts and registering JAR files stored in HDFS, Amazon S3, or other distributed file systems. For more information, go to https://issues.apache.org/jira/browse/PIG-1505.

Status: Committed

Fixed in Apache Pig Version: 0.8.0

Support multiple file systems in Pig

Add support for Pig scripts to read data from one file system and write it to another. For more information, go to https://issues.apache.org/jira/browse/PIG-1564.

Status: Not Committed

Fixed in Apache Pig Version: n/a

Add Piggybank datetime and string UDFs

Add datetime and string UDFs to support custom Pig scripts. For more information, go to https://issues.apache.org/jira/browse/PIG-1565.

Status: Not Committed

Fixed in Apache Pig Version: n/a

Additional Pig Functions

The Amazon EMR development team has created additional Pig functions that simplify string manipulation and make it easier to format date-time information. These are available at http://aws.amazon.com/code/2730.