Process a CSV file from Amazon S3 using a Distributed Map - AWS Step Functions

Process a CSV file from Amazon S3 using a Distributed Map

This sample project demonstrates how you can use the Distributed Map state to iterate over 10,000 rows of a CSV file that is generated using a Lambda function. The CSV file contains shipping information of customer orders and is stored in an Amazon S3 bucket. The Distributed Map iterates over a batch of 10 rows in the CSV file for data analysis.

The Distributed Map contains a Lambda function to detect any delayed orders. The Distributed Map also contains an Inline Map to process the delayed orders in a batch and returns these delayed orders in an array. For each delayed order, the Inline Map sends a message to an Amazon SQS queue. Finally, this sample project stores the Map Run results to another Amazon S3 bucket in your AWS account.

With Distributed Map, you can run up to 10,000 parallel child workflow executions at a time. In this sample project, the maximum concurrency of Distributed Map is set at 1000 that limits it to 1000 parallel child workflow executions.

This sample project creates the state machine, the supporting AWS resources, and configures the related IAM permissions. Explore this sample project to learn about using the Distributed Map for orchestrating large-scale, parallel workloads, or use it as a starting point for your own projects.

AWS CloudFormation template and additional resources

You use a CloudFormation template to deploy this sample project. This template creates the following resources in your AWS account:

  • A Step Functions state machine.

  • Execution role for the state machine. This role grants the permissions that your state machine needs to access other AWS services and resources such as the Lambda function's Invoke action.

  • A Lambda function named CSVGeneratorFunction that generates a CSV file which contains the customer order details.

  • Execution role for the CSV generator Lambda function. This role grants the function permission to access other AWS services.

  • An Amazon S3 input bucket to store the generated CSV file.

  • A delayed order detection Lambda function that analyzes the CSV file data and detects any delayed orders.

  • Execution role for the delayed order Lambda function. This role grants the function permission to access other AWS services.

  • An Amazon S3 output bucket to store the analysis results of the customer orders.

  • An Amazon SQS queue to which Step Functions sends messages for every delayed order. These messages contain the IDs of the customers and their orders.

  • A CloudWatch log group that stores information related to the state machine’s execution history.

Important

Standard charges apply for each service.

Step 1: Create the state machine and provision resources

  1. Open the Step Functions console and choose Create state machine.

  2. Type Distributed Map to process a CSV file in S3 in the search box, and then choose Distributed Map to process a CSV file in S3 from the search results that are returned.

  3. Choose Next to continue.

  4. Step Functions lists the AWS services used in the sample project you selected. It also shows a workflow graph for the sample project. Deploy this project to your AWS account or use it as a starting point for building your own projects. Based on how you want to proceed, choose Run a demo or Build on it.

    For information about the resources that will be created for this sample project, see AWS CloudFormation template and additional resources.

    The following image shows the workflow graph for the Distributed Map to process a CSV file in S3 sample project:

    Workflow graph of the Distributed Map to process a CSV file in S3 sample project.
  5. Choose Use template to continue with your selection.

  6. Do one of the following:

    • If you selected Build on it, Step Functions creates the workflow prototype, but does not deploy the resources in the workflow definition, so you can continue building out your workflow prototype.

      In Workflow Studio'sDesign mode, you can can additional states into your workflow protoype. Or, you can switch to the Code mode to use the integrated code editor to edit the Amazon States Language (ASL) definition of your state machine from the Step Functions console.

      Important

      You may need to update the placeholder Amazon Resource Name (ARN) for the resources used in the sample project before you can run your workflow.

    • If you selected Run a demo, Step Functions creates a read-only project which uses an AWS CloudFormation template to deploy the AWS resources in that template to your AWS account. You can view the state machine definition by choosing the Code mode.

      Choose Deploy and run to deploy the project and create the resources.

      Note that deploying can take up to 10 minutes for resources and IAM permissions to be created. While your resources are being deployed, you can open the AWS CloudFormation Stack ID link to see which resources are being provisioned.

      After all the resources have been created, you should see the project on the State machines page in the console.

      Important

      Standard charges may apply for each service used in the CloudFormation template.

Step 2: Run the state machine

After all the resources are provisioned and deployed, you can run the state machine.

  1. On the State machines page, choose your sample project.

  2. On the sample project page, choose Start execution.

  3. In the Start execution dialog box, do the following:

    1. (Optional) Enter input values in JSON format to run your sample project.

      If you chose to Run a demo, you need not provide any execution input.

      Note

      If the demo project you deployed contains prepopulated execution input data, use that input to run the state machine.

    2. Choose Start execution.

    3. (Optional) The Step Functions console directs you to a page that's titled with your execution ID. This page is known as the Execution Details page. On this page, you can review the execution results as the execution progresses or after it's complete.

      After the execution is complete, choose individual states on the Graph view, and then choose the individual tabs on the Step details pane to view each state's details including input, output, and definition respectively.

    4. (Optional) Review the execution results exported to the Amazon S3 bucket. These results include data, such as execution input and output, ARN, and execution status. For more information, see ResultWriter (Map).