Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

Intermediate Compression (AMI 1.0)

Hadoop sends data between the mappers and reducers in its shuffle process. This network operation is a bottleneck for many clusters. To reduce this bottleneck, Amazon EMR enables intermediate data compression by default. Because it provides a reasonable amount of compression with only a small CPU impact, we use the LZO codec.

You can modify the default compression settings with a bootstrap action. For more information about using bootstrap actions, see (Optional) Create Bootstrap Actions to Install Additional Software.

The following table presents the default values for the parameters that affect intermediate compression.

ParameterValue
mapred.compress.map.output true
mapred.map.output.compression.codeccom.hadoop.compression.lzo.LzoCodec