Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Supported Pig Versions

The Pig version you can add to your cluster depends on the version of the Amazon Elastic MapReduce (Amazon EMR) AMI and the version of Hadoop you are using. The table below shows which AMI versions and versions of Hadoop are compatible with the different versions of Pig. We recommend using the latest available version of Pig to take advantage of performance enhancements and new functionality. For more information about the Amazon EMR AMIs and AMI versioning, see Choose an Amazon Machine Image (AMI).

If you choose to install Pig on your cluster using the console or the AWS CLI, the AMI you specify determines the version of Pig installed. By default, Pig is installed on your cluster when you use the console, but you can remove it during cluster creation. Pig is also installed by default when you use the AWS CLI unless you use the --applications parameter to identify which applications you want on your cluster.

If you install Pig on your cluster using the Amazon EMR CLI, you can use the --pig-versions parameter to install a particular version of Pig, or you can use the --pig-versions parameter with the latest keyword to install the latest version of Pig. The AWS CLI does not support the --pig-versions parameter.

When you use the API to install Pig, the default version is used unless you specify --pig-versions as an argument to the step that loads Pig onto the cluster during the call to RunJobFlow.

Pig VersionAMI VersionConfiguration ParametersPig Version Details
0.12

Release Notes

Documentation

3.1.0 and later

--ami-version 3.1.0

Adds support for the following:

  • Streaming UDFs without JVM implementations

  • ASSERT and IN operators

  • CASE expression

  • AvroStorage as a Pig built-in function.

  • ParquetLoader and ParquetStorer as built-in functions

  • BigInteger and BigDecimal types

0.11.1.1

Release Notes

Documentation

2.2 and later

--pig-versions 0.11.1.1

--ami-version 2.2

Improves performance of LOAD command with PigStorage if input resides in Amazon S3.

0.11.1

Release Notes

Documentation

2.2 and later

--pig-versions 0.11.1

--ami-version 2.2

Adds support for JDK 7, Hadoop 2, Groovy User Defined Functions, SchemaTuple optimization, new operators, and more. For more information, see Pig 0.11.1 Change Log.

0.9.2.2

Release Notes

Documentation

2.2 and later

--pig-versions 0.9.2.2

--ami-version 2.2

Adds support for Hadoop 1.0.3.

0.9.2.1

Release Notes

Documentation

2.2 and later

--pig-versions 0.9.2.1

--ami-version 2.2

Adds support for MapR. For more information, see Using the MapR Distribution for Hadoop.

0.9.2

Release Notes

Documentation

2.2 and later

--pig-versions 0.9.2

--ami-version 2.2

Includes several performance improvements and bug fixes. For complete information about the changes for Pig 0.9.2, go to the Pig 0.9.2 Change Log.

0.9.1

Release Notes

Documentation

2.0

--pig-versions 0.9.1

--ami-version 2.0

 
0.6

Release Notes

1.0

--pig-versions 0.6

--ami-version 1.0

 
0.3

Release Notes

1.0

--pig-versions 0.3

--ami-version 1.0

 

The following examples demonstrate adding specific versions of Pig to a cluster using the Amazon EMR CLI. The AWS CLI does not support Pig versioning.

To add a specific Pig version to a cluster using the Amazon EMR CLI

Note

The Amazon EMR CLI is no longer under feature development. Customers are encouraged to use the Amazon EMR commands in the AWS CLI instead.

  • Use the --pig-versions parameter. The following command-line example creates an interactive Pig cluster running Hadoop 1.0.3 and Pig 0.11.1.

    In the directory where you installed the Amazon EMR CLI, type the following command. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "Test Pig" \
      --ami-version 2.3.6 \
      --num-instances 5 --instance-type m1.large \
      --pig-interactive \
      --pig-versions 0.11.1
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "Test Pig" --ami-version 2.3.6 --num-instances 5 --instance-type m1.large --pig-interactive --pig-versions 0.11.1

To add the latest version of Pig to a cluster using the Amazon EMR CLI

Note

The Amazon EMR CLI is no longer under feature development. Customers are encouraged to use the Amazon EMR commands in the AWS CLI instead.

  • Use the --pig-versions parameter with the latest keyword. The following command-line example creates an interactive Pig cluster running the latest version of Pig.

    In the directory where you installed the Amazon EMR CLI, type the following command. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "Test Latest Pig" \
      --ami-version 2.2 \
      --num-instances 5 --instance-type m1.large \
      --pig-interactive \
      --pig-versions latest
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "Test Latest Pig" --ami-version 2.2 --num-instances 5 --instance-type m1.large --pig-interactive --pig-versions latest

To add multiple versions of Pig on a cluster using the Amazon EMR CLI

Note

The Amazon EMR CLI is no longer under feature development. Customers are encouraged to use the Amazon EMR commands in the AWS CLI instead.

  • Use the --pig-versions parameter and separate the version numbers by commas. The following command-line example creates an interactive Pig job flow running Hadoop 0.20.205 and Pig 0.9.1 and Pig 0.9.2. With this configuration, you can use either version of Pig on the cluster.

    In the directory where you installed the Amazon EMR CLI, type the following command. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "Test Pig" \
      --ami-version 2.0 \
      --num-instances 5 --instance-type m1.large \
      --pig-interactive \
      --pig-versions 0.9.1,0.9.2
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "Test Pig" --ami-version 2.0 --num-instances 5 --instance-type m1.large --pig-interactive --pig-versions 0.9.1,0.9.2

If you have multiple versions of Pig loaded on a cluster, calling Pig accesses the default version of Pig, or the version loaded last if there are multiple --pig-versions parameters specified in the cluster creation call. When the comma-separated syntax is used with --pig-versions to load multiple versions, Pig accesses the default version.

To run a specific version of Pig on a cluster

  • Add the version number to the call. For example, pig-0.11.1 or pig-0.9.2. You would do this, for example, in an interactive Pig cluster by using SSH to connect to the master node and then running a command like the following from the terminal.

    pig-0.9.2
    	  		

Pig Version Details

Amazon EMR supports certain Pig releases that might have additional Amazon EMR patches applied. You can configure which version of Pig to run on Amazon Elastic MapReduce (Amazon EMR) clusters. For more information about how to do this, see Process Data with Pig. The following sections describe different Pig versions and the patches applied to the versions loaded on Amazon EMR.

Pig Patches

This section describes the custom patches applied to Pig versions available with Amazon EMR.

Pig 0.11.1.1 Patches

The Amazon EMR version of Pig 0.11.1.1 is a maintenance release that improves performance of LOAD command with PigStorage if the input resides in Amazon S3.

Pig 0.11.1 Patches

The Amazon EMR version of Pig 0.11.1 contains all the updates provided by the Apache Software Foundation and the cumulative Amazon EMR patches from Pig version 0.9.2.2. However, there are no new Amazon EMR-specific patches in Pig 0.11.1.

Pig 0.9.2 Patches

Apache Pig 0.9.2 is a maintenance release of Pig. The Amazon EMR team has applied the following patches to the Amazon EMR version of Pig 0.9.2.

PatchDescription
PIG-1429

Add the Boolean data type to Pig as a first class data type. For more information, go to https://issues.apache.org/jira/browse/PIG-1429.

Status: Committed

Fixed in Apache Pig Version: 0.10

PIG-1824

Support import modules in Jython UDF. For more information, go to https://issues.apache.org/jira/browse/PIG-1824.

Status: Committed

Fixed in Apache Pig Version: 0.10

PIG-2010

Bundle registered JARs on the distributed cache. For more information, go to https://issues.apache.org/jira/browse/PIG-2010.

Status: Committed

Fixed in Apache Pig Version: 0.11

PIG-2456

Add a ~/.pigbootup file where the user can specify default Pig statements. For more information, go to https://issues.apache.org/jira/browse/PIG-2456.

Status: Committed

Fixed in Apache Pig Version: 0.11

PIG-2623

Support using Amazon S3 paths to register UDFs. For more information, go to https://issues.apache.org/jira/browse/PIG-2623.

Status: Committed

Fixed in Apache Pig Version: 0.10, 0.11

Pig 0.9.1 Patches

The Amazon EMR team has applied the following patches to the Amazon EMR version of Pig 0.9.1.

PatchDescription
Support JAR files and Pig scripts in dfs

Add support for running scripts and registering JAR files stored in HDFS, Amazon S3, or other distributed file systems. For more information, go to https://issues.apache.org/jira/browse/PIG-1505.

Status: Committed

Fixed in Apache Pig Version: 0.8.0

Support multiple file systems in Pig

Add support for Pig scripts to read data from one file system and write it to another. For more information, go to https://issues.apache.org/jira/browse/PIG-1564.

Status: Not Committed

Fixed in Apache Pig Version: n/a

Add Piggybank datetime and string UDFs

Add datetime and string UDFs to support custom Pig scripts. For more information, go to https://issues.apache.org/jira/browse/PIG-1565.

Status: Not Committed

Fixed in Apache Pig Version: n/a

Additional Pig Functions

The Amazon EMR development team has created additional Pig functions that simplify string manipulation and make it easier to format date-time information. These are available at http://aws.amazon.com/code/2730.