Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

Submit Hive Work

This section demonstrates submitting Hive work to an Amazon EMR cluster. You can submit Hive work to your cluster interactively, or you can submit work as a cluster step using the console, CLI, or API. You can submit steps when the cluster is launched, or you can submit steps to a running cluster. For more information, see Submit Work to a Cluster.

Before running the sample script used in this section, create an Amazon S3 output location for your data. The script output is saved to an Amazon S3 bucket you create. For more information about creating an output location, see Prepare an Output Location (Optional).

Submit Hive Work Using the Amazon EMR Console

This example describes how to use the Amazon EMR console to submit a Hive step to a running cluster. Whether you submit steps when the cluster is launched, or to a running cluster, the process for adding steps in the console is identical.

To submit a Hive step to a cluster using the console

  1. Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/.

  2. In the Cluster List, click the name of your cluster.

  3. Scroll to the Steps section and expand it, then click Add step.

  4. In the Add Step dialog:

    • For Step type, choose Hive program.

    • For Name, accept the default name (Hive program) or type a new name.

    • For Script S3 location, type or browse to the location of your Hive script. For example: s3://[yourregion].elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q (replacing [yourregion], including the brackets, with your region; for example, s3://us-west-2.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q).

    • For Input S3 location, type or browse to the location of your input data. For example: s3://[yourregion].elasticmapreduce.samples (replacing [yourregion], including the brackets, with your region; for example, s3://us-west-2.elasticmapreduce.samples).

    • For Output S3 location, type or browse to the name of your Amazon S3 output bucket. For more information about creating an output location, see Prepare an Output Location (Optional).

    • For Arguments, leave the field blank.

    • For Action on failure, accept the default option (Continue).

  5. Click Add. The step appears in the console with a status of Pending.

  6. The status of the step changes from Pending to Running to Completed as the step runs. To update the status, click the Refresh icon above the Actions column.

Submit Hive Work Using the AWS CLI or the Amazon EMR CLI

These examples describe how to use the AWS CLI or the Amazon EMR CLI to submit Hive work to a cluster. Using the CLI, you can submit steps when a cluster is launched, or you can submit steps to a long-running cluster.

To launch a cluster and submit a Hive step using the AWS CLI

To submit a Hive step when the cluster is launched, type the --steps parameter, indicate the step type using the Type argument, and provide the necessary argument string.

  • To launch a cluster and submit a Hive step, type the following command. Replace myKey with the name of your Amazon EC2 key pair, replace [yourregion] with the name of your region, and replace mybucket with the name of your Amazon S3 output location.

    • Linux, UNIX, and Mac OS X users:

      aws emr create-cluster --name "Test cluster" --ami-version 3.3 --applications Name=Hue Name=Hive Name=Pig \
      --use-default-roles --ec2-attributes KeyName=myKey \
      --instance-type m3.xlarge --instance-count 3 \
      --steps Type=Hive,Name="Hive Program",ActionOnFailure=CONTINUE,Args=[-f,s3://[yourregion].elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q,-d,INPUT=s3://[yourregion].elasticmapreduce.samples,-d,OUTPUT=s3://mybucket]
    • Windows users:

      aws emr create-cluster --name "Test cluster" --ami-version 3.3 --applications Name=Hue Name=Hive Name=Pig --use-default-roles --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --steps Type=Hive,Name="Hive Program",ActionOnFailure=CONTINUE,Args=[-f,s3://[yourregion].elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q,-d,INPUT=s3://[yourregion].elasticmapreduce.samples,-d,OUTPUT=s3://mybucket]

    When you specify the instance count without using the --instance-groups parameter, a single master node is launched, and the remaining instances are launched as core nodes. All nodes use the instance type specified in the command.

    Note

    If you have not previously created the default Amazon EMR service role and EC2 instance profile, type aws emr create-default-roles to create them before typing the create-cluster subcommand.

    For more information about using Amazon EMR commands in the AWS CLI, see AWS Command Line Interface Reference.

To submit a Hive step to a running cluster using the AWS CLI

To add a Hive step to a running cluster, type the add-steps subcommand with the --steps parameter, indicate the step type using the Type argument, and provide the necessary argument string.

  • To submit a Hive step to a running cluster, type the following command. Replace j-2AXXXXXXGAPLF with the cluster ID, replace [yourregion] with the name of your region, and replace mybucket with the name of your Amazon S3 output location.

    • Linux, UNIX, and Mac OS X users:

      aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
      --steps Type=Hive,Name="Hive Program",ActionOnFailure=CONTINUE,Args=[-f,s3://[yourregion].elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q,-d,INPUT=s3://[yourregion].elasticmapreduce.samples,-d,OUTPUT=s3://mybucket]
    • Windows users:

      aws emr add-steps --cluster-id j-2AXXXXXXGAPLF --steps Type=Hive,Name="Hive Program",ActionOnFailure=CONTINUE,Args=[-f,s3://[yourregion].elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q,-d,INPUT=s3://[yourregion].elasticmapreduce.samples,-d,OUTPUT=s3://mybucket]

    For more information about using Amazon EMR commands in the AWS CLI, see AWS Command Line Interface Reference.

To launch a cluster and submit a Hive step using the Amazon EMR CLI

Note

The Amazon EMR CLI is no longer under feature development. Customers are encouraged to use the Amazon EMR commands in the AWS CLI instead.

  • In the directory where you installed the Amazon EMR CLI, type the following command. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --name "Test Hive" --ami-version 3.3 --hive-script \
      s3://elasticmapreduce/samples/hive-ads/libs/model-build.q \
      --args -d,LIBS=s3://elasticmapreduce/samples/hive-ads/libs,\
      -d,INPUT=s3://elasticmapreduce/samples/hive-ads/tables,\
      -d,OUTPUT=s3://mybucket/hive-ads/output/
    • Windows users:

      ruby elastic-mapreduce --create --name "Test Hive" --ami-version 3.3 --hive-script s3://elasticmapreduce/samples/hive-ads/libs/model-build.q --args -d,LIBS=s3://elasticmapreduce/samples/hive-ads/libs,-d,INPUT=s3://elasticmapreduce/samples/hive-ads/tables,-d,OUTPUT=s3://mybucket/hive-ads/output/

    The output looks similar to the following.

    Created cluster JobFlowID

    By default, this command launches a cluster to run on a two-node cluster. Later, when your steps are running correctly on a small set of sample data, you can launch clusters to run on multiple nodes. You can specify the number of nodes and the type of instance to run with the --num-instances and --instance-type parameters, respectively.