Menu
Amazon Simple Workflow Service
Developer Guide (API Version 2012-01-25)

Amazon SWF Metrics for CloudWatch

Amazon SWF now provides metrics for CloudWatch that you can use to track your workflows and activities and set alarms on threshold values that you choose. You can view metrics using the AWS Management Console. For more information, see Viewing Amazon SWF Metrics for CloudWatch using the AWS Management Console.

Reporting Units for Amazon SWF Metrics

Metrics that Report a Time Interval

Some of the Amazon SWF metrics for CloudWatch are time intervals, always measured in milliseconds. The CloudWatch unit is reported as Time. These metrics generally correspond to stages of your workflow execution for which you can set workflow and activity timeouts, and have similar names.

For example, the DecisionTaskStartToCloseTime metric measures the time it took for the decision task to complete after it began executing, which is the same time period for which you can set a DecisionTaskStartToCloseTimeout value.

For a diagram of each of these workflow stages and to learn when they occur over the workflow and activity lifecycles, see Amazon SWF Timeout Types .

Metrics that Report a Count

Some of the Amazon SWF metrics for CloudWatch report results as a count. For example, WorkflowsCanceled, records a result as either one or zero, indicating whether or not the workflow was canceled. A value of zero doesn't indicate that the metric was not reported, only that the condition described by the metric did not occur.

Some of the Amazon SWF metrics for CloudWatch that report a Count in CloudWatch are a count per second. For instance, ProvisionedRefillRate, which is reported as a Count in CloudWatch, represents a rate of the Count of requests per second.

For count metrics, minimum and maximum will always be either zero or one, but average will be a value ranging from zero to one.

API and Decision Event Metrics

You can monitor both API and Decision events in CloudWatch to provide insight into your usage and capacity. See deciders in the How Amazon SWF Works section, and the Decision topic in the Amazon Simple Workflow Service API Reference.

You can also monitor these limits to alarm when you are approaching your Amazon SWF throttling limits. See Amazon SWF throttling limits for a description of these limits and their default settings. These limits are designed to prevent incorrect workflows from consuming excessive system resources. To request an increase to your limits see: Requesting a Limit Increase.

As a best practice, you should configure CloudWatch alarms at around 60% of your API or decision events capacity. This will allow you to either adjust your workflow, or request a service limit increase, before Amazon SWF throttling is enabled. Depending on the burstiness of your calls, you can configure different alarms to notify when you are approaching your service limits:

  • If your traffic has significant spikes, set an alarm at 60% of your ProvisionedBucketSize limits.

  • If your calls have a relatively steady rate, set an alarm at 60% of your ProvisionedRefillRate limit for your related API and decision events.

Amazon SWF Metrics

The following metrics are available for Amazon SWF:

Metric

Description

DecisionTaskScheduleToStartTime

The time interval, in milliseconds, between the time that the decision task was scheduled and when it was picked up by a worker and started.

CloudWatch Units: Time

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Average, Minimum, Maximum

DecisionTaskStartToCloseTime

The time interval, in milliseconds, between the time that the decision task was started and when it closed.

CloudWatch Units: Time

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Average, Minimum, Maximum

DecisionTasksCompleted

The count of decision tasks that have been completed.

CloudWatch Units: Count

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Sum

StartedDecisionTasksTimedOutOnClose

The count of decision tasks that started but timed out on closing.

CloudWatch Units: Count

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Sum

WorkflowStartToCloseTime

The time, in milliseconds, between the time the workflow started and when it closed.

CloudWatch Units: Time

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Average, Minimum, Maximum

WorkflowsCanceled

The count of workflows that were canceled.

CloudWatch Units: Count

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Sum

WorkflowsCompleted

The count of workflows that completed.

CloudWatch Units: Count

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Sum

WorkflowsContinuedAsNew

The count of workflows that continued as new.

CloudWatch Units: Count

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Sum

WorkflowsFailed

The count of workflows that failed.

CloudWatch Units: Count

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Sum

WorkflowsTerminated

The count of workflows that were terminated.

CloudWatch Units: Count

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Sum

WorkflowsTimedOut

The count of workflows that timed out, for any reason.

CloudWatch Units: Count

Dimensions: Domain, WorkflowTypeName, WorkflowTypeVersion

Valid statistics: Sum

ActivityTaskScheduleToCloseTime

The time interval, in milliseconds, between the time when the activity was scheduled and when it closed.

CloudWatch Units: Time

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Average, Minimum, Maximum

ActivityTaskScheduleToStartTime

The time interval, in milliseconds, between the time when the activity task was scheduled and when it started.

CloudWatch Units: Time

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Average, Minimum, Maximum

ActivityTaskStartToCloseTime

The time interval, in milliseconds, between the time when the activity task started and when it closed.

CloudWatch Units: Time

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Average, Minimum, Maximum

ActivityTasksCanceled

The count of activity tasks that were canceled.

CloudWatch Units: Count

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Sum

ActivityTasksCompleted

The count of activity tasks that completed.

CloudWatch Units: Count

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Sum

ActivityTasksFailed

The count of activity tasks that failed.

CloudWatch Units: Count

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Sum

ScheduledActivityTasksTimedOutOnClose

The count of activity tasks that were scheduled but timed out on close.

CloudWatch Units: Count

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Sum

ScheduledActivityTasksTimedOutOnStart

The count of activity tasks that were scheduled but timed out on start.

CloudWatch Units: Count

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Sum

StartedActivityTasksTimedOutOnClose

The count of activity tasks that were started but timed out on close.

CloudWatch Units: Count

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Sum

StartedActivityTasksTimedOutOnHeartbeat

The count of activity tasks that were started but timed out due to a heartbeat timeout.

CloudWatch Units: Count

Dimensions: Domain, ActivityTypeName, ActivityTypeVersion

Valid statistics: Sum

ThrottledEvents

The count of requests that have been throttled.

CloudWatch Units: Count

Dimensions: APIName, DecisionName

Valid statistics: Sum

ProvisionedBucketSize

The count of available requests per second.

CloudWatch Units: Count

Dimensions: APIName, DecisionName

Valid statistics: Minimum

ConsumedCapacity

The count of requests per second.

CloudWatch Units: Count

Dimensions: APIName, DecisionName

Valid statistics: Sum

ProvisionedRefillRate

The count of requests per second that are allowed into the bucket.

CloudWatch Units: Count

Dimensions: APIName, DecisionName

Valid statistics: Minimum

Dimension

Description

Domain

Filters data to the Amazon SWF domain that the workflow or activity is running in.

ActivityTypeName

Filters data to the name of the activity type.

ActivityTypeVersion

Filters data to the version of the activity type.

WorkflowTypeName

Filters data to the name of the workflow type for this workflow execution.

WorkflowTypeVersion

Filters data to the version of the workflow type for this workflow execution.

APIName

Filters data to an API of the specified API name.

DecisionName

Filters data to the specified Decision name.