Use Amazon EMR cluster scaling to adjust for changing workloads - Amazon EMR

Use Amazon EMR cluster scaling to adjust for changing workloads

You can adjust the number of Amazon EC2 instances available to an Amazon EMR cluster automatically or manually in response to workloads that have varying demands. To use automatic scaling, you have two options. You can enable Amazon EMR managed scaling or create a custom automatic scaling policy. The following table describes the differences between the two options.

Amazon EMR managed scaling Custom automatic scaling

Scaling policies and rules

No policy required. Amazon EMR manages the automatic scaling activity by continuously evaluating cluster metrics and making optimized scaling decisions.

You need to define and manage the automatic scaling policies and rules, such as the specific conditions that trigger scaling activities, evaluation periods, cooldown periods, etc.

Supported Amazon EMR releases

Amazon EMR version 5.30.0 and higher (except Amazon EMR version 6.0.0)

Amazon EMR version 4.0.0 and higher

Supported cluster composition

Instance groups or instance fleets

Instance groups only

Scaling limits configuration

Scaling limits are configured for the entire cluster.

Scaling limits can only be configured for each instance group.

Metrics evaluation frequency

Every 5 to 10 seconds

More frequent evaluation of metrics allows Amazon EMR to make more precise scaling decisions.

You can define the evaluation periods only in five-minute increments.

Supported applications

Only YARN applications are supported, such as Spark, Hadoop, Hive, Flink. Amazon EMR managed scaling does not support applications that are not based on YARN, such as Presto or HBase.

You can choose which applications are supported when defining the automatic scaling rules.

Considerations

  • An Amazon EMR cluster always comprises one or three primary nodes. Once you initially configure the cluster, you can only scale core and task nodes. You can't scale the number of primary nodes for the cluster.

  • For instance groups, reconfiguration operations and resize operations occur consecutively and not concurrently. If you initiate a reconfiguration while an instance group is resizing, the reconfiguration starts once the instance group completes the resize in progress. Conversely, if you initiate a resize operation while an instance group its reconfiguration.