Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Command Line Interface Options

The Amazon EMR command line interface (CLI) supports the following options.

OptionDescription
-a ACCESS_ID

Sets the AWS access identifier.

--access-id ACCESS_ID

Sets the AWS access identifier.

--active

Modifies a command to apply only to clusters in the RUNNING, STARTING or WAITING states. Used with --list.

Usage: View Cluster Details

--add-instance-group INSTANCE_ROLE

Adds an instance group to an existing cluster. The role may be task only.

Usage: Resize a Running Cluster, Change the Number of Spot Instances in a Cluster

--alive

Used with --create to launch a cluster that continues running even after completing all its steps. Interactive clusters require this option.

Usage: Add Steps to a Cluster

--all

Modifies a command to apply only to all clusters, regardless of status. Used with --list, it lists all the clusters created in the last two weeks.

--ami-version AMI_VERSION

Used with --create to specify the version of the AMI to use when launching the cluster. This setting also determines the version of Hadoop to install, because the --hadoop-version parameter is no longer supported.

Usage: Choose a Machine Image

--apps-path APPLICATION_PATH

Specifies the Amazon S3 path to the base of the Amazon EMR bucket to use, for example: s3://elasticmapreduce.

--arg ARG

Passes in a single argument value to a script or application running on the cluster.

Note

When used in a Hadoop streaming cluster, if you use the --arg options, they must immediately follow the --stream option.

Usage: Launch a Hive Cluster, Launch a Pig Cluster, Launch a Cascading Cluster, Add Steps to a Cluster, Create Bootstrap Actions to Install Additional Software (Optional)

--args ARG1,ARG2,ARG3,...

Passes in multiple arguments, separated by commas, to a script or application running on the cluster. This is a shorthand for specifying multiple --arg options. The --args does not support escaping for the comma character (,). To pass arguments containing the comma character (,) use the --arg option which does not consider comma as a separator. The argument string may be surrounded with double-quotes. In addition, you can use double quotes when passing arguments containing whitespace characters.

Note

When used in a Hadoop streaming cluster, if you use the --args option, it must immediately follow the --stream option.

Usage: Launch a Hive Cluster, Launch a Pig Cluster, Launch a Cascading Cluster, Add Steps to a Cluster, Create Bootstrap Actions to Install Additional Software (Optional)

--availability-zone AVAILABILITY_ZONE

The Availability Zone to launch the cluster in. For more information about Availability Zones supported by Amazon EMR, see Regions and Endpoints in the Amazon Web Services General Reference.

--backup-dir BACKUP_LOCATION

The directory where an Hbase backup exists or should be created.

Usage: Back Up and Restore HBase

--backup-version VERSION_NUMBER

Specifies the version number of an existing Hbase backup to restore.

Usage: Back Up and Restore HBase

--beta-path BETA_APPLICATION_PATH

Specifies the Amazon S3 path to the base of the Amazon EMR bucket to use, for example: s3://beta.elasticmapreduce.

--bid-price BID_PRICE

The bid price, in U.S. dollars, for a group of Spot Instances.

Usage: Launch Spot Instances in a Cluster

--bootstrap-action LOCATION_OF_bootstrap_ACTION_SCRIPT

Used with --create to specify a bootstrap action to run when the cluster launches. The location of the bootstrap action script is typically a location in Amazon S3. You can add more than one bootstrap action to a cluster.

Usage: Create Bootstrap Actions to Install Additional Software (Optional)

--bootstrap-name bootstrap_NAME

Sets the name of the bootstrap action.

Usage: Create Bootstrap Actions to Install Additional Software (Optional)

-c CREDENTIALS_FILE

Specifies the credentials file that contains the AWS access identifier and the AWS private key to use when contacting the Amazon EMR web service.

For CLI access, you need an access key ID and secret access key. Use IAM user access keys instead of AWS root account access keys. IAM lets you securely control access to AWS services and resources in your AWS account. For more information about creating access keys, see How Do I Get Security Credentials? in the AWS General Reference.

--cache FILE_LOCATION#NAME_OF_FILE_IN_CACHE

Adds an individual file to the Distributed Cache.

Usage: Import files using Distributed Cache

--cache-archive LOCATION#NAME_OF_ARCHIVE

Adds an archive file to the Distributed Cache

Usage: Import files using Distributed Cache

--consistent

Pauses all write operations to the HBase cluster during the backup process, to ensure a consistent backup.

Usage: Back Up and Restore HBase

--create

Launches a new cluster.

Usage: Launch a Streaming Cluster, Launch a Hive Cluster, Launch a Pig Cluster, Launch a Cascading Cluster, Launch an HBase Cluster on Amazon EMR

--created-after=DATETIME

Lists all clusters created after the specified time and date in XML date-time format.

--created-before=DATETIME

Lists all clusters created before the specified time and date in XML date-time format.

--credentials CREDENTIALS_FILE

Specifies the credentials file that contains the AWS access identifier and the AWS private key to use when contacting the Amazon EMR web service.

For CLI access, you need an access key ID and secret access key. Use IAM user access keys instead of AWS root account access keys. IAM lets you securely control access to AWS services and resources in your AWS account. For more information about creating access keys, see How Do I Get Security Credentials? in the AWS General Reference.

--eip ELASTIC_IP

Associates an elastic IP to the master node. If no elastic IP is specified, allocate a new elastic IP and associate it to the master node. For more information, see Associate an Elastic IP Address with a Cluster.

--enable-debugging

Used with --create to launch a cluster with debugging enabled.

Usage: Configure Logging and Debugging (Optional)

--endpoint ENDPOINT

Specifies the endpoint of the Amazon EMR web service to connect to.

--debug

Prints stack traces when exceptions occur.

--describe

Returns information about the specified cluster or clusters.

Usage: View Cluster Details

--disable-full-backups

Turns off scheduled full Hbase backups by passing this flag into a call with --hbase-schedule-backup.

Usage: Back Up and Restore HBase

--disable-incremental-backups

Turns off scheduled incremental Hbase backups by passing this flag into a call with --hbase-schedule-backup.

Usage: Back Up and Restore HBase

--full-backup-time-interval INTERVAL

An integer that specifies the period of time units to elapse between automated full backups of the HBase cluster.

Usage: Back Up and Restore HBase

--full-backup-time-unit TIME_UNIT

The unit of time to use with --full-backup-time-interval to specify how often automatically scheduled Hbase backups should run. This can take any one of the following values: minutes, hours, days.

Usage: Back Up and Restore HBase

--get SOURCE

Copies the specified file from the master node using scp.

-h

Displays help information from the CLI.

--hbase

Used to launch an Hbase cluster.

Usage: Launch an HBase Cluster on Amazon EMR

--hbase-backup

Creates a one-time backup of HBase data to the location specified by --backup-dir.

Usage: Back Up and Restore HBase

--hbase-restore

Restores a backup from the location specified by --backup-dir and (optionally) the version specified by --backup-version.

Usage: Back Up and Restore HBase

--hbase-schedule-backup

Schedules an automated backup of HBase data.

Usage: Back Up and Restore HBase

--help

Displays help information from the CLI.

--hive-interactive

Used with --create to launch a cluster with Hive installed.

Usage: Interactive and Batch Hive Clusters

--hive-script HIVE_SCRIPT_LOCATION

The Hive script to run in the cluster.

Usage: Interactive and Batch Hive Clusters

--hive-site HIVE_SITE_LOCATION

Installs the configuration values in hive-site.xml in the specified location. The --hive-site parameter overrides only the values defined in hive-site.xml.

Usage: Create a Metastore Outside the Hadoop Cluster, Additional Features of Hive in Amazon EMR

--hive-versions HIVE_VERSIONS

The Hive version or versions to load. This can be a Hive version number or "latest" to load the latest version. When you specify more than one Hive version, separate the versions with a comma.

Usage: Supported Hive Versions

--impala-conf OPTIONS

Use with the --create and --impala-interactive options to provide command-line parameters for Impala to parse.

The parameters are key/value pairs in the format "key1=value1,key2=value2,…". For example to set the Impala startup options IMPALA_BACKEND_PORT and IMPALA_MEM_LIMIT, use the following command:

./elastic-mapreduce --create --alive --instance-type m1.large --instance-count 3 --ami-version 3.0.2 --impala-interactive --impala-conf "IMPALA_BACKEND_PORT=22001,IMPALA_MEM_LIMIT=70%"

--impala-interactive

Use with the --create option to launch an Amazon EMR cluster with Impala installed.

Usage: Launch the Cluster

--impala-output PATH

Use with the --impala-script option to store Impala script output to an Amazon S3 bucket using the syntax --impala-output s3-path.

--impala-script PATH

Use with the --create option to add a step to a cluster to run an Impala query file stored in Amazon S3 using the syntax --impala-script s3-path. For example:

./elastic-mapreduce --create --alive --instance-type m1.large --instance-count 3 --ami-version 3.0.2 --impala-script s3://my-bucket/script-name.sql --impala-output s3://my-bucket/ --impala-conf "IMPALA_MEM_LIMIT=50%"

When using --impala-script with --create, the --impala-version and --impala-conf options will also function. It is acceptable, but unnecessary, to use --impala-interactive and --impala-script in the same command when creating a cluster. The effect is equivalent to using --impala-script alone.

Alternatively, you can add a step to an existing cluster, but you must already have installed Impala on the cluster. For example:

./elastic-mapreduce -j cluster-id --impala-script s3://my-bucket/script-name.sql --impala-output s3://my-bucket/

If you try to use --impala-script to add a step to a cluster where Impala is not installed, you will get an error message similar to Error: Impala is not installed.

--incremental-backup-time-interval TIME_INTERVAL

An integer that specifies the period of time units to elapse between automated incremental backups of the HBase cluster. Used with --hbase-schedule-backup this parameter creates regularly scheduled incremental backups. If this period schedules a full backup at the same time as an incremental backup is scheduled, only the full backup is created. Used with --incremental-backup-time-unit.

Usage: Back Up and Restore HBase

--incremental-backup-time-unit ITME_UNIT

The unit of time to use with --incremental-backup-time-interval to specify how often automatically scheduled incremental Hbase backups should run. This can take any one of the following values: minutes, hours, days.

Usage: Back Up and Restore HBase

--info INFO

Specifies additional information during cluster creation.

--input LOCATION_OF_INPUT_DATA

Specifies the input location for the cluster.

Usage: Launch a Streaming Cluster

--instance-count INSTANCE_COUNT

Sets the count of nodes for an instance group.

Usage: Resize a Running Cluster, Change the Number of Spot Instances in a Cluster

--instance-group INSTANCE_GROUP_TYPE

Sets the instance group type. A type is MASTER, CORE, or TASK.

Usage: Resize a Running Cluster

--instance-type INSTANCE_TYPE

Sets the type of EC2 instance to create nodes for an instance group.

Usage: Resize a Running Cluster, Launch Spot Instances in a Cluster

-j JOB_FLOW_IDENTIFIER

Specifies the cluster with the given cluster identifier.

Usage: View Cluster Details, Add Steps to a Cluster, Resize a Running Cluster, Change the Number of Spot Instances in a Cluster

--jar JAR_FILE_LOCATION

Specifies the location of a Java archive (JAR) file. Typically, the JAR file is stored in an Amazon S3 bucket.

Usage: Resize a Running Cluster, Distributed Copy Using S3DistCp

--jobconf KEY=VALUE

Specifies jobconf arguments to pass to a streaming cluster, for example mapred.task.timeout=800000.

--jobflow JOB_FLOW_IDENTIFIER

Specifies the cluster with the given cluster identifier.

Usage: View Cluster Details, Add Steps to a Cluster, Resize a Running Cluster, Change the Number of Spot Instances in a Cluster

--jobflow-role IAM_ROLE_NAME

Launches the EC2 instances of a cluster with the specified IAM role.

Configure IAM Roles for Amazon EMR

--json JSON_FILE

Adds a sequence of steps stored in the specified JSON file to the cluster.

--key-pair KEY_PAIR_PEM_FILE

The name of the Amazon EC2 key pair to set as the connection credential when you launch the cluster.

--key-pair-file FILE_PATH

The path to the local PEM file of the Amazon EC2 key pair to set as the connection credential when you launch the cluster.

--list

Lists clusters created in the last two days.

Usage: View Cluster Details

--logs

Displays the step logs for the step most recently executed.

--log-uri

Specifies the Amazon S3 bucket to receive log files. Used with --create.

Usage: View HBase Log Files

--main-class

Specifies the JAR file's main class. This parameter is not needed if your JAR file has a manifest.

Usage: Add Steps to a Cluster

--mapper LOCATION_OF_MAPPER_CODE

The name of a Hadoop built-in class or the location of a mapper script.

Usage: Launch a Streaming Cluster

--master-instance-type INSTANCE_TYPE

The type of EC2 instances to launch as the master nodes in the cluster.

Usage: Build Binaries Using Amazon EMR

--modify-instance-group INSTANCE_GROUP_ID

Modifies an existing instance group.

Usage: Resize a Running Cluster, Change the Number of Spot Instances in a Cluster

--name "JOB_FLOW_NAME"

Specifies a name for the cluster. This can only be set when the jobflow is created.

Usage: Launch a Streaming Cluster, Launch a Hive Cluster, Launch a Pig Cluster, Launch a Cascading Cluster, Launch an HBase Cluster on Amazon EMR

--no-wait

Don't wait for the master node to start before executing scp or ssh or assigning an eip.

--no-steps

Prevents the CLI from listing steps when listing clusters.

--num-instances NUMBER_OF_INSTANCES

Used with --create and --modify-instance-group to specify the number of EC2 instances in the cluster.

Usage: Launch a Streaming Cluster, Launch a Hive Cluster, Launch a Pig Cluster, Launch a Cascading Cluster, Launch an HBase Cluster on Amazon EMR, Change the Number of Spot Instances in a Cluster

--output LOCATION_OF_JOB_FLOW_OUTPUT

Specifies the output location for the cluster.

Usage: Launch a Streaming Cluster

-p PRIVATE_KEY

Specifies the AWS private key to use when contacting the Amazon EMR web service.

--param VARIABLE=VALUE ARGS

Substitutes the string VARIABLE with the string VALUE in the JSON file.

--pig-interactive

Used with --create to launch a cluster with Pig installed.

Usage: Launch a Pig Cluster

--pig-versions VERSION

Specifies the version or versions of Pig to install on the cluster. If specifying more than one version of Pig, separated the versions with commas.

Usage: Supported Pig Versions

--pig-script PIG_SCRIPT_LOCATION

The Pig script to run in the cluster.

Usage: Launch a Pig Cluster

--plain-output

Returns the cluster identifier from the create step as simple text.

--put SOURCE

Copies a file to the master node using scp.

--print-hive-version

Prints the version of Hive that is currently active on the cluster.

Usage: Supported Hive Versions

--private-key PRIVATE_KEY

Specifies the AWS private key to use when contacting the Amazon EMR web service.

--reducer REDUCER

The name of a Hadoop built-in class or the location of a reducer script.

Usage: Launch a Streaming Cluster

--region REGION

Specifies the region in which to launch the cluster.

Usage: Choose an AWS Region

--resize-jobflow

Adds a step to resize the cluster.

--scp FILE_TO_COPY

Copies a file from your local directory to the master node of the cluster.

Usage: Add Steps to a Cluster

--script SCRIPT_LOCATION

Specifies the location of a script. Typically, the script is stored in an Amazon S3 bucket.

--set-termination-protection TERMINATION_PROTECTION_STATE

Enables or disables termination protection on the specified cluster or clusters. To enable termination protection, set this value to true. To disable termination protection, set this value to false.

Usage: Protect a Cluster from Termination

--set-visible-to-all-users BOOLEAN

Makes the instances in an existing cluster visible to all IAM users of the AWS account that launched the cluster.

Usage: Configure IAM User Permissions

--slave-instance-type

The type of EC2 instances to launch as the slave nodes in the cluster.

--socks

Uses SSH to create a tunnel to the master node of the specified cluster. You can then use this as a SOCKS proxy to view web interfaces hosted on the master node.

Usage: Open an SSH Tunnel to the Master Node, Configure FoxyProxy to View Websites Hosted on the Master Node

--ssh COMMAND

Uses SSH to connect to the master node of the specified cluster and, optionally, run a command. This option requires that you have an SSH client, such as OpenSSH, installed on your desktop.

Usage: Connect to the Master Node Using SSH

--start-time START_TIME

Specifies the time that a Hbase backup schedule should start. If this is not set, the first backup begins immediately. This should be in ISO date-time format.You can use this to ensure your first data load process has completed before performing the initial backup or to have the backup occur at a specific time each day.

Usage: Back Up and Restore HBase

--state JOB_FLOW_STATE

Specifies the state of the cluster. The cluster state will be one of the following values: STARTING, RUNNING, WAITING, TERMINATED.

Usage: View Cluster Details

--step-name

Specifies a name for a cluster step.

--step-action

Specifies the action the cluster should take when the step finishes. This can be one of CANCEL_AND_WAIT, TERMINATE_JOB_FLOW or CONTINUE.

--stream

Used with --create and --arg to launch a streaming cluster.

Note

The --arg option must immediately follow the --stream option.

Usage: Launch a Streaming Cluster

--subnet SUBNET_IDENTIFIER

Launches a cluster in an Amazon VPC subnet.

Usage: Select a Amazon VPC Subnet for the Cluster (Optional)

--supported-product PRODUCT

Installs third-party software on an Amazon EMR cluster; for example, installing a third-party distribution of Hadoop. It accepts optional arguments for the third-party software to read and act on. It is used with --create to launch the cluster that can use the specified third-party applications. The 2013-03-19 and newer versions of the Amazon EMR CLI accepts optional arguments using the --args parameter.

--tag

Manages tags associated with Amazon EMR resources.

Usage: Tagging Amazon EMR Clusters

--terminate

Terminates the specified cluster or clusters.

Usage: Terminate a Cluster

--to DESTINATION

Specifies the destination location when copying files to and from the master node using scp.

--trace

Traces commands made to the web service.

--unarrest-instance-group INSTANCE_ROLE

Unarrests an instance group of the cluster.

-v

Turns on verbose logging of program interaction.

--verbose

Turns on verbose logging of program interaction.

--version

Displays the version of the CLI.

Usage: Command Line Interface Releases

--visible-to-all-users

Makes a cluster visible to all IAM users. Used with --create.

Usage: Configure IAM User Permissions

--wait-for-steps

Causes the cluster to wait until a step has completed.

Usage: Add Steps to a Cluster

--with-termination-protection

Used with --create to launch the cluster with termination protection enabled.

Usage: Protect a Cluster from Termination