Menu
Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)

Submit a Cascading Step

This section covers the basics of submitting a Cascading step.

Submit a Cascading Step Using the Console

This example describes how to use the Amazon EMR console to submit a Cascading step to a running cluster as a custom JAR file.

To submit a Cascading step using the console

  1. Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/.

  2. In the Cluster List, click the name of your cluster.

  3. Scroll to the Steps section and expand it, then click Add step.

  4. In the Add Step dialog:

    • For Step type, choose Custom jar.

    • For Name, accept the default name (Custom jar) or type a new name.

    • For JAR location, type or browse to the location of your script. The value must be in the form BucketName/path/ScriptName.

    • For Arguments, leave the field blank.

    • For Action on failure, accept the default option (Continue).

  5. Click Add. The step appears in the console with a status of Pending.

  6. The status of the step changes from Pending to Running to Completed as the step runs. To update the status, click the Refresh icon above the Actions column.

Launching a Cluster and Submitting a Cascading Step Using the AWS CLI

This example describes how to use the AWS CLI to create a cluster and submit a Cascading step. The Cascading SDK includes Cascading and Cascading-based tools such as Multitool and Load. For more information, go to http://www.cascading.org/sdk/.

To create a cluster and submit a Cascading step using the AWS CLI

  • Type the following command to launch your cluster and submit a Cascading step. Replace myKey with the name of your EC2 key pair and replace mybucket with the name of your Amazon S3 bucket.

    • Linux, UNIX, and Mac OS X users:

      aws emr create-cluster --name "Test cluster" --ami-version 2.4 --applications Name=Hive Name=Pig \
      --use-default-roles --ec2-attributes KeyName=myKey \
      --instance-type m3.xlarge --instance-count 3 \
      --bootstrap-actions Path=pathtobootstrapscript,Name="CascadingSDK" \
      --steps Type="CUSTOM_JAR",Name="Cascading Step",ActionOnFailure=CONTINUE,Jar=pathtojarfile,\
      Args=["-input","pathtoinputdata","-output","pathtooutputbucket","arg1","arg2"]
    • Windows users:

      aws emr create-cluster --name "Test cluster" --ami-version 2.4 --applications Name=Hive Name=Pig --use-default-roles --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --bootstrap-actions Path=pathtobootstrapscript,Name="CascadingSDK" --steps Type="CUSTOM_JAR",Name="Cascading Step",ActionOnFailure=CONTINUE,Jar=pathtojarfile,Args=["-input","pathtoinputdata","-output","pathtooutputbucket","arg1","arg2"]

    When you specify the instance count without using the --instance-groups parameter, a single Master node is launched, and the remaining instances are launched as core nodes. All nodes will use the instance type specified in the command.

    Note

    If you have not previously created the default EMR service role and EC2 instance profile, type aws emr create-default-roles to create them before typing the create-cluster subcommand.

    For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.