Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

JSON Configuration Files

When Amazon EMR creates a Hadoop cluster, each node contains a pair of JSON files containing configuration information about the node and the currently running cluster. These files are in the /mnt/var/lib/info directory, and accessible by scripts running on the node.

Node Settings

Settings for an Amazon EMR cluster node are contained in the instance.json file.

The following table describes the contents of the instance.json file.

ParameterDescription
isMaster

Indicates that is the master node.

Type: Boolean

isRunningNameNode

Indicates that this node is running the Hadoop name node daemon.

Type: Boolean

isRunningDataNode

Indicates that this node is running the Hadoop data node daemon.

Type: Boolean

isRunningJobTracker

Indicates that this node is running the Hadoop job tracker daemon.

Type: Boolean

isRunningTaskTracker

Indicates that this node is running the Hadoop task tracker daemon.

Type: Boolean

Hadoop 2.2.0 adds the following parameters to the instance.json file.

ParameterDescription
isRunningResourceManager Indicates that this node is running the Hadoop resource manager daemon.

Type: Boolean

isRunningNodeManager Indicates that this node is running the Hadoop node manager daemon.

Type: Boolean

The following example shows the contents of an instance.json file:

{
     "instanceGroupId":"Instance_Group_ID",
            "isMaster": Boolean,
   "isRunningNameNode": Boolean,
   "isRunningDataNode": Boolean,
 "isRunningJobTracker": Boolean,
"isRunningTaskTracker": Boolean
}

To identify settings in JSON file using a bootstrap action

This procedure demonstrates how to execute the command line function echo to display the string running on master nodeon a master node by evaluating the JSON file parameter instance.isMaster.

  • In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "RunIf" \
      --bootstrap-action s3://elasticmapreduce/bootstrap-actions/run-if \
      --bootstrap-name "Run only on master" \
      --args "instance.isMaster=true,echo,’Running on master node’"
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "RunIf" --bootstrap-action s3://elasticmapreduce/bootstrap-actions/run-if --bootstrap-name "Run only on master" --args "instance.isMaster=true,echo,’Running on master node’"

Cluster Configuration

Information about the currently running cluster is contained in the job-flow.json file.

The following table describes the contents of the job-flow.json file.

ParameterDescription
JobFlowID

Contains the ID for the cluster.

Type: String

jobFlowCreationInstant

Contains the time that the cluster was created.

Type: Long

instanceCount

Contains the number of nodes in an instance group.

Type: Integer

masterInstanceID

Contains the ID for the master node.

Type: String

masterPrivateDnsName

Contains the private DNS name of the master node.

Type: String

masterInstanceType

Contains the EC2 instance type of the master node.

Type: String

slaveInstanceType

Contains the EC2 instance type of the slave nodes.

Type: String

HadoopVersion

Contains the version of Hadoop running on the cluster.

Type: String

instanceGroups

A list of objects specifying each instance group in the cluster

instanceGroupId—unique identifier for this instance group.

Type: String

instanceGroupName—uUser defined name of the instance group.

Type: String

instanceRole—one of Master, Core, or Task.

Type: String

instanceType—the Amazon EC2 type of the node, such as "m1.small".

Type: String

requestedInstanceCount—the target number of nodes for this instance group.

Type: Long

The following example shows the contents of an job-flow.json file.

{
             "jobFlowId":"JobFlowID",
"jobFlowCreationInstant": CreationInstanceID,
         "instanceCount": Count,
      "masterInstanceId":"MasterInstanceID",
  "masterPrivateDnsName":"Name",
    "masterInstanceType":"Amazon_EC2_Instance_Type",
     "slaveInstanceType":"Amazon_EC2_Instance_Type",
         "hadoopVersion":"Version",
        "instanceGroups":
            [
                {
                 "instanceGroupId":"InstanceGroupID",
               "instanceGroupName":"Name",
                    "instanceRole":"Master",
                      "marketType":"Type",
                    "instanceType":"AmazonEC2InstanceType",
          "requestedInstanceCount": Count},
                }
                {
                 "instanceGroupId":"InstanceGroupID",
               "instanceGroupName":"Name",
                    "instanceRole":"Core",
                      "marketType":"Type",
                    "instanceType":"AmazonEC2InstanceType",
          "requestedInstanceCount": Count},
                }
                {
                 "instanceGroupId":"InstanceGroupID",
               "instanceGroupName":"Name",
                    "instanceRole":"Task",
                      "marketType":"Type",
                    "instanceType":"AmazonEC2InstanceType",
          "requestedInstanceCount": Count
                }
           ]
}