Hadoop sends data between the mappers and reducers in its shuffle process. This network operation is a bottleneck for many clusters. To reduce this bottleneck, Amazon EMR enables intermediate data compression by default. Because it provides a reasonable amount of compression with only a small CPU impact, we use the LZO codec.
You can modify the default compression settings with a bootstrap action. For more information about using bootstrap actions, see (Optional) Create Bootstrap Actions to Install Additional Software.
The following table presents the default values for the parameters that affect intermediate compression.