Orchestrating large-scale parallel workloads in your state machines
With Step Functions, you can orchestrate large-scale parallel workloads to perform tasks, such as on-demand processing of semi-structured data. These parallel workloads let you concurrently process large-scale data sources stored in Amazon S3. For example, you might process a single JSON or CSV file that contains large amounts of data. Or you might process a large set of Amazon S3 objects.
To set up a large-scale parallel workload in your workflows, include a Map
state in Distributed mode. The Map state processes items in a
dataset concurrently. In Distributed mode, the
Map
state allows
high-concurrency
processing. In Distributed mode, the Map
state processes the items in the dataset
in iterations called child workflow executions. You can specify the number
of child workflow executions that can run in
parallel.
If you don't specify, Step Functions runs 10,000 parallel child workflow executions in parallel. For more
information about Map
state and its Distributed mode, see Map state and Using Map state in Distributed
mode.
When you don't specify Distributed mode, the Map
state runs
in the default Inline mode, which supports
up
to 40 concurrent iterations. For more
information about the two Map
state modes, see Map state processing modes.
The following illustration explains how you can set up large-scale parallel workloads in your workflows.

Tip
To learn more about using the Distributed Map state, try the following tutorials and workshop:
Iterate over items in a batch inside child workflow executions
Large-Scale Parallelization with Distributed Map
in Module 14 - Data Processing of The AWS Step Functions Workshop
Contents
Key terms used in this topic
- Distributed mode
-
A processing mode of the
Map
state. In this mode, each iteration of theMap
state runs as a child workflow execution that enables high concurrency. Each child workflow execution has its own execution history, which is separate from the parent workflow's execution history. This mode supports reading input from large-scale Amazon S3 data sources. - Distributed Map state
-
A
Map
state set to Distributed processing mode. - Map workflow
A set of steps that a
Map
state runs.- Child workflow execution
-
An iteration of the Distributed Map state. A child workflow execution has its own execution history, which is separate from the parent workflow's execution history.
- Map Run
-
When you run a
Map
state in Distributed mode, Step Functions creates a Map Run resource. A Map Run refers to a set of child workflow executions that a Distributed Map state starts, and the runtime settings that control these executions. Step Functions assigns an Amazon Resource Name (ARN) to your Map Run. You can examine a Map Run in the Step Functions console. You can also invoke theDescribeMapRun
API action. A Map Run also emits metrics to CloudWatch.For more information, see Examining Map Run.