Adding steps to a cluster using the AWS CLI - Amazon EMR

Adding steps to a cluster using the AWS CLI

The following procedures demonstrate adding steps to a newly created cluster and to a running cluster using the AWS CLI. In both examples, the --steps subcommand is used to add steps to the cluster.

To add steps during cluster creation
  • Type the following command to create a cluster and add an Apache Pig step. Make sure to replace myKey with the name of your Amazon EC2 key pair.

    aws emr create-cluster --name "Test cluster" \ --applications Name=Spark \ --use-default-roles \ --ec2-attributes KeyName=myKey \ --instance-groups InstanceGroupType=PRIMARY,InstanceCount=1,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m5.xlarge \ --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--class","org.apache.spark.examples.SparkPi","/usr/lib/spark/examples/jars/spark-examples.jar","5"],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]'
    Note

    The list of arguments changes depending on the type of step.

    By default, the step concurrency level is 1. You can set the step concurrency level by using the StepConcurrencyLevel parameter when you create a cluster.

    The output is a cluster identifier similar to the following.

    { "ClusterId": "j-2AXXXXXXGAPLF" }
To add a step to a running cluster
  • Type the following command to add a step to a running cluster. Replace j-2AXXXXXXGAPLF with your own cluster ID.

    aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \ --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--class","org.apache.spark.examples.SparkPi","/usr/lib/spark/examples/jars/spark-examples.jar","5"],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]'

    The output is a step identifier similar to the following.

    { "StepIds": [ "s-Y9XXXXXXAPMD" ] }
To modify the StepConcurrencyLevel in a running cluster
  1. In a running cluster, you can modify the StepConcurrencyLevel by using the ModifyCluster API. For example, type the following command to increase the StepConcurrenyLevel to 10. Replace j-2AXXXXXXGAPLF with your cluster ID.

    aws emr modify-cluster --cluster-id j-2AXXXXXXGAPLF --step-concurrency-level 10
  2. The output is similar to the following.

    { "StepConcurrencyLevel": 10 }

For more information on using Amazon EMR commands in the AWS CLI, see the AWS CLI Command Reference.