Adding Steps to a Cluster Using the AWS CLI
The following procedures demonstrate adding steps to a newly created cluster and to
a
running cluster using the AWS CLI. In both examples, the --steps
subcommand is used to add steps to the cluster.
To add steps during cluster creation
-
Type the following command to create a cluster and add an Apache Pig step. Replace
myKey
with the name of your Amazon EC2 key pair and replacemybucket
with the name of your Amazon S3 bucket.-
Linux, UNIX, and macOS
aws emr create-cluster --name "
Test cluster
" --ami-version2.4
--applications Name=Hive
Name=Pig
\ --use-default-roles --ec2-attributes KeyName=myKey
\ --instance-groups InstanceGroupType=MASTER
,InstanceCount=1
,InstanceType=m5.xlarge
InstanceGroupType=CORE
,InstanceCount=2
,InstanceType=m5.xlarge
\ --steps Type=PIG
,Name="Pig Program
",ActionOnFailure=CONTINUE
,Args=[-f,s3://mybucket/scripts/pigscript.pig
,-p,INPUT=s3://mybucket/inputdata/
,-p,OUTPUT=s3://mybucket/outputdata/
,$INPUT=s3://mybucket/inputdata/
,$OUTPUT=s3://mybucket/outputdata/
] -
Windows
aws emr create-cluster --name "
Test cluster
" --ami-version2.4
--applications Name=Hive
Name=Pig
--use-default-roles --ec2-attributes KeyName=myKey
--instance-groups InstanceGroupType=MASTER
,InstanceCount=1
,InstanceType=m5.xlarge
InstanceGroupType=CORE
,InstanceCount=2
,InstanceType=m5.xlarge
--steps Type=PIG
,Name="Pig Program
",ActionOnFailure=CONTINUE
,Args=[-f,s3://mybucket/scripts/pigscript.pig
,-p,INPUT=s3://mybucket/inputdata/
,-p,OUTPUT=s3://mybucket/outputdata/
,$INPUT=s3://mybucket/inputdata/
,$OUTPUT=s3://mybucket/outputdata/
]
Note The list of arguments changes depending on the type of step.
By default, the step concurrency level is
1
. You can set the step concurrency level by using theStepConcurrencyLevel
parameter when you create a cluster.The output is a cluster identifier similar to the following.
{ "ClusterId": "j-2AXXXXXXGAPLF" }
-
To add a step to a running cluster
-
Type the following command to add a step to a running cluster. Replace
j-2AXXXXXXGAPLF
with your cluster ID and replacemybucket
with your Amazon S3 bucket name.aws emr add-steps --cluster-id
j-2AXXXXXXGAPLF
--steps Type=PIG
,Name="Pig Program
",Args=[-f,s3://mybucket/scripts/pigscript.pig
,-p,INPUT=s3://mybucket/inputdata/
,-p,OUTPUT=s3://mybucket/outputdata/
,$INPUT=s3://mybucket/inputdata/
,$OUTPUT=s3://mybucket/outputdata/
]The output is a step identifier similar to the following.
{ "StepIds": [ "s-Y9XXXXXXAPMD" ] }
To modify the StepConcurrencyLevel in a running cluster
-
In a running cluster, you can modify the StepConcurrencyLevel by using the
ModifyCluster
API. For example, type the following command to increase the StepConcurrenyLevel to10
. Replacej-2AXXXXXXGAPLF
with your cluster ID.aws emr modify-cluster --cluster-id
j-2AXXXXXXGAPLF
--step-concurrency-level 10 -
The output is similar to the following.
{ "StepConcurrencyLevel": 10 }
For more information on using Amazon EMR commands in the AWS CLI, see https://docs.aws.amazon.com/cli/latest/reference/emr.