Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Intermediate Compression (AMI 1.0)

Hadoop sends data between the mappers and reducers in its shuffle process. This network operation is a bottleneck for many clusters. To reduce this bottleneck, Amazon EMR enables intermediate data compression by default. Because it provides a reasonable amount of compression with only a small CPU impact, we use the LZO codec.

You can modify the default compression settings with a bootstrap action. For more information about using bootstrap actions, see Create Bootstrap Actions to Install Additional Software (Optional).

The following table presents the default values for the parameters that affect intermediate compression.

ParameterValue
mapred.compress.map.output true
mapred.map.output.compression.codeccom.hadoop.compression.lzo.LzoCodec

To enable or disable compression using a bootstrap action

  • In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "Reducer speculative execution" \
      --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
      --bootstrap-name "Disable compression" \
      --args "mapred.compress.map.output=false"   \
      --args "mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "Reducer speculative execution" --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --bootstrap-name "Disable compression" --args "mapred.compress.map.output=false" --args "mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec