| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
Hadoop sends data between the mappers and reducers in its shuffle process. This network operation is a bottleneck for many clusters. To reduce this bottleneck, Amazon Elastic MapReduce (Amazon EMR) enables intermediate data compression by default. Because it provides a reasonable amount of compression with only a small CPU impact, we use the LZO codec.
You can modify the default compression settings with a bootstrap action. For more information about using bootstrap actions, refer to Create Bootstrap Actions to Install Additional Software (Optional).
The following table presents the default values for the parameters that affect intermediate compression.
| Parameter | Value |
|---|---|
| mapred.compress.map.output | true |
| mapred.map.output.compression.codec | com.hadoop.compression.lzo.LzoCodec |
Example Enabling/disabling compression using a bootstrap action
In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --alive --name "Reducer speculative execution" \ --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \ --bootstrap-name "Disable compression" \ --args "mapred.compress.map.output=false" \ --args "mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
Windows users:
ruby elastic-mapreduce --create --alive --name "Reducer speculative execution" --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --bootstrap-name "Disable compression" --args "mapred.compress.map.output=false" --args "mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec