Process Data with Pig

Amazon Elastic MapReduce (Amazon EMR) supports Apache Pig, a platform you can use to analyze large data sets. For more information about Pig, go to Amazon EMR supports several versions of Pig. The following sections describe how to configure Pig on Amazon EMR.

Pig is an open-source, Apache library that runs on top of Hadoop. The library takes SQL-like commands written in a language called Pig Latin and converts those commands into MapReduce clusters. Pig enables you to create database types of queries using familiar SQL-like commands and syntax, so you do not have to write complex MapReduce algorithms using a lower level computer language, such as Java. Although you can execute one Pig Latin command at a time, it is far more common to write a script of Pig Latin commands that accomplish a complete task. Amazon EMR can use these scripts when you upload them to Amazon S3.