Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Intermediate Compression (AMI 1.0)

Hadoop sends data between the mappers and reducers in its shuffle process. This network operation is a bottleneck for many clusters. To reduce this bottleneck, Amazon EMR enables intermediate data compression by default. Because it provides a reasonable amount of compression with only a small CPU impact, we use the LZO codec.

You can modify the default compression settings with a bootstrap action. For more information about using bootstrap actions, see Create Bootstrap Actions to Install Additional Software (Optional).

The following table presents the default values for the parameters that affect intermediate compression.

ParameterValue
mapred.compress.map.output true
mapred.map.output.compression.codeccom.hadoop.compression.lzo.LzoCodec