Considerations for Running Multiple Steps in Parallel - Amazon EMR

Considerations for Running Multiple Steps in Parallel

  • When you select a step concurrency level for your cluster, you must consider whether or not the master node instance type meets the memory requirements of user workloads. The main step executer process runs on the master node for each step. Running multiple steps in parallel requires more memory and CPU utilization from the master node than running one step at a time.

  • To achieve complex scheduling and resource management of concurrent steps, you can use YARN scheduling features such as FairScheduler or CapacityScheduler. For example, you can use FairScheduler with a queueMaxAppsDefault set to prevent more than a certain number of jobs from running at a time.

  • The step concurrency level is subject to the configurations of resource managers. For example, if YARN is configured with only a parallelism of 5, then you can only have five YARN applications running in parallel even if the StepConcurrencyLevel is set to 10. For more information about configuring resource managers, see Configuring Applications in the Amazon EMR Release Guide.

  • You can use EMR automatic scaling to scale up and down based on the YARN resources to prevent resource contention. For more information, see Using Automatic Scaling in Amazon EMR in the Amazon EMR Management Guide.

  • When you decrease the step concurrent level, EMR allows any running steps to complete before reducing the number of steps. If the resources are exhausted because the cluster is running too many concurrent steps, we recommend manually canceling any running steps to free up resources.