Manage an Amazon EMR Job - AWS Step Functions

Manage an Amazon EMR Job

This sample project demonstrates Amazon EMR and AWS Step Functions integration.

It shows how to create an Amazon EMR cluster, add multiple steps and run them, and then terminate the cluster.

Important

Amazon EMR does not have a free pricing tier. Running the sample project will incur costs. You can find pricing information on the Amazon EMR pricing page. The availability of Amazon EMR service integration is subject to the availability of Amazon EMR APIs. Because of this, this sample project might not work correctly in some AWS Regions. See the Amazon EMR documentation for limitations in special Regions.

Create the State Machine and Provision Resources

  1. Open the Step Functions console and choose Create a state machine.

  2. Choose Sample Projects, and then choose Manage an EMR Job.

    The state machine Code and Visual Workflow are displayed.

    
          Container task notification workflow.
  3. Choose Next.

    The Deploy resources page is displayed, listing the resources that will be created. For this sample project the resources include an Amazon S3 Bucket.

  4. Choose Deploy Resources.

    Note

    It can take up to 10 minutes for these resources and related AWS Identity and Access Management (IAM) permissions to be created. While the Deploy resources page is displayed, you can open the Stack ID link to see which resources are being provisioned.

Start a New Execution

  1. On the New execution page, enter an execution name (optional), and then choose Start Execution.

  2. (Optional) To help identify your execution, you can specify an ID for it in the Enter an execution name box. If you don't enter an ID, Step Functions generates a unique ID automatically.

    Note

    Step Functions allows you to create state machine, execution, and activity names that contain non-ASCII characters. These non-ASCII names don't work with Amazon CloudWatch. To ensure that you can track CloudWatch metrics, choose a name that uses only ASCII characters.

  3. (Optional) You can go to the newly created state machine on the Step Functions Dashboard, and then choose New execution.

  4. When an execution is complete, you can select states on the Visual workflow and browse the Input and Output under Step details.

Example State Machine Code

The state machine in this sample project integrates with Amazon EMR by passing parameters directly to those resources. Browse through this example state machine to see how Step Functions uses a state machine to call the Amazon EMR task synchronously, waits for the task to succeed or fail, and terminates the cluster.

For more information about how AWS Step Functions can control other AWS services, see Service Integrations with AWS Step Functions .

{ "Comment": "An example of the Amazon States Language for running jobs on Amazon EMR", "StartAt": "Create an EMR cluster", "States": { "Create an EMR cluster": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:createCluster.sync", "Parameters": { "Name": "ExampleCluster", "VisibleToAllUsers": true, "ReleaseLabel": "emr-5.26.0", "Applications": [ { "Name": "Hive" } ], "ServiceRole": "<EMR_SERVICE_ROLE>", "JobFlowRole": "<EMR_EC2_INSTANCE_PROFILE>", "LogUri": "s3://<EMR_LOG_S3_BUCKET>/logs/", "Instances": { "KeepJobFlowAliveWhenNoSteps": true, "InstanceFleets": [ { "Name": "MyMasterFleet", "InstanceFleetType": "MASTER", "TargetOnDemandCapacity": 1, "InstanceTypeConfigs": [ { "InstanceType": "m5.xlarge" } ] }, { "Name": "MyCoreFleet", "InstanceFleetType": "CORE", "TargetOnDemandCapacity": 1, "InstanceTypeConfigs": [ { "InstanceType": "m5.xlarge" } ] } ] } }, "ResultPath": "$.cluster", "Next": "Run first step" }, "Run first step": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:addStep.sync", "Parameters": { "ClusterId.$": "$.cluster.ClusterId", "Step": { "Name": "My first EMR step", "ActionOnFailure": "CONTINUE", "HadoopJarStep": { "Jar": "command-runner.jar", "Args": ["<COMMAND_ARGUMENTS>"] } } }, "Retry" : [ { "ErrorEquals": [ "States.ALL" ], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2.0 } ], "ResultPath": "$.firstStep", "Next": "Run second step" }, "Run second step": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:addStep.sync", "Parameters": { "ClusterId.$": "$.cluster.ClusterId", "Step": { "Name": "My second EMR step", "ActionOnFailure": "CONTINUE", "HadoopJarStep": { "Jar": "command-runner.jar", "Args": ["<COMMAND_ARGUMENTS>"] } } }, "Retry" : [ { "ErrorEquals": [ "States.ALL" ], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2.0 } ], "ResultPath": "$.secondStep", "Next": "Terminate Cluster" }, "Terminate Cluster": { "Type": "Task", "Resource": "arn:<PARTITION>:states:::elasticmapreduce:terminateCluster", "Parameters": { "ClusterId.$": "$.cluster.ClusterId" }, "End": true } } }

IAM Example

This example AWS Identity and Access Management (IAM) policy generated by the sample project includes the least privilege necessary to execute the state machine and related resources. It's a best practice to include only those permissions that are necessary in your IAM policies.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "elasticmapreduce:RunJobFlow", "elasticmapreduce:DescribeCluster", "elasticmapreduce:TerminateJobFlows" ], "Resource": "*" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": [ "arn:aws:iam::123456789012:role/StepFunctionsSample-EMRJobManagement-EMRServiceRole-ANPAJ2UCCR6DPCEXAMPLE", "arn:aws:iam::123456789012:role/StepFunctionsSample-EMRJobManagementWJALRXUTNFEMI-ANPAJ2UCCR6DPCEXAMPLE-EMREc2InstanceProfile-1ANPAJ2UCCR6DPCEXAMPLE" ] }, { "Effect": "Allow", "Action": [ "events:PutTargets", "events:PutRule", "events:DescribeRule" ], "Resource": [ "arn:aws:events:sa-east-1:123456789012:rule/StepFunctionsGetEventForEMRRunJobFlowRule" ] } ] }

The following policy ensures that addStep has sufficient permissions.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "elasticmapreduce:AddJobFlowSteps", "elasticmapreduce:DescribeStep", "elasticmapreduce:CancelSteps" ], "Resource": "arn:aws:elasticmapreduce:*:*:cluster/*" }, { "Effect": "Allow", "Action": [ "events:PutTargets", "events:PutRule", "events:DescribeRule" ], "Resource": [ "arn:aws:events:sa-east-1:123456789012:rule/StepFunctionsGetEventForEMRAddJobFlowStepsRule" ] } ] } }

For information about how to configure IAM when using Step Functions with other AWS services, see IAM Policies for Integrated Services.