Use timeouts to avoid stuck executions - AWS Step Functions

Use timeouts to avoid stuck executions

By default, the Amazon States Language doesn't specify timeouts for state machine definitions. Without an explicit timeout, Step Functions often relies solely on a response from an activity worker to know that a task is complete. If something goes wrong and the TimeoutSeconds field isn't specified for an Activity or Task state, an execution is stuck waiting for a response that will never come.

To avoid this situation, specify a reasonable timeout when you create a Task in your state machine. For example:

"ActivityState": { "Type": "Task", "Resource": "arn:aws:states:us-east-1:123456789012:activity:HelloWorld", "TimeoutSeconds": 300, "Next": "NextState" }

If you use a callback with a task token (.waitForTaskToken), we recommend that you use heartbeats and add the HeartbeatSeconds field in your Task state definition. You can set HeartbeatSeconds to be less than the task timeout so if your workflow fails with a heartbeat error then you know it's because of the task failure instead of the task taking a long time to complete.

{ "StartAt": "Push to SQS", "States": { "Push to SQS": { "Type": "Task", "Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken", "HeartbeatSeconds": 600, "Parameters": { "MessageBody": { "myTaskToken.$": "$$.Task.Token" }, "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123456789012/push-based-queue" }, "ResultPath": "$.SQS", "End": true } } }

For more information, see Task in the Amazon States Language documentation.

Note

You can set a timeout for your state machine using the TimeoutSeconds field in your Amazon States Language definition. For more information, see State machine structure.