Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Process Data with Pig

Amazon Elastic MapReduce (Amazon EMR) supports Apache Pig, a platform you can use to analyze large data sets. For more information about Pig, go to http://pig.apache.org/. Amazon EMR supports several versions of Pig. The following sections describe how to configure Pig on Amazon EMR.

Pig is an open-source, Apache library that runs on top of Hadoop. The library takes SQL-like commands written in a language called Pig Latin and converts those commands into MapReduce clusters. Pig enables you to create database types of queries using familiar SQL-like commands and syntax, so you do not have to write complex MapReduce algorithms using a lower level computer language, such as Java. Although you can execute one Pig Latin command at a time, it is far more common to write a script of Pig Latin commands that accomplish a complete task. Amazon EMR can use these scripts when you upload them to Amazon S3.