Troubleshooting Systems Manager Automation - AWS Systems Manager

Troubleshooting Systems Manager Automation

Use the following information to help you troubleshoot problems with AWS Systems Manager Automation, a capability of AWS Systems Manager. This topic includes specific tasks to resolve issues based on Automation error messages.

Common Automation errors

This section includes information about common Automation errors.

VPC not defined 400

By default, when Automation runs either the AWS-UpdateLinuxAmi runbook or the AWS-UpdateWindowsAmi runbook, the system creates a temporary instance in the default VPC (172.30.0.0/16). If you deleted the default VPC, you will receive the following error:

VPC not defined 400

To solve this problem, you must specify a value for the SubnetId input parameter.

Automation execution failed to start

An automation can fail with an access denied error or an invalid assume role error if you haven't properly configured AWS Identity and Access Management (IAM) roles, and policies for Automation.

Access denied

The following examples describe situations when an automation failed to start with an access denied error.

Access Denied to Systems Manager API

Error message: User: user arn isn't authorized to perform: ssm:StartAutomationExecution on resource: document arn (Service: AWSSimpleSystemsManagement; Status Code: 400; Error Code: AccessDeniedException; Request ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

  • Possible cause 1: The user attempting to start the automation doesn't have permission to invoke the StartAutomationExecution API. To resolve this issue, attach the required IAM policy to the user that was used to start the automation.

  • Possible cause 2: The user attempting to start the automation has permission to invoke the StartAutomationExecution API but doesn't have permission to invoke the API by using the specific runbook. To resolve this issue, attach the required IAM policy to the user that was used to start the automation.

Access Denied Because of Missing PassRole Permissions

Error message: User: user arn isn't authorized to perform: iam:PassRole on resource: automation assume role arn (Service: AWSSimpleSystemsManagement; Status Code: 400; Error Code: AccessDeniedException; Request ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

The user attempting to start the automation doesn't have PassRole permission for the assume role. To resolve this issue, attach the iam:PassRole policy to the role of the user attempting to start the automation. For more information, see Task 2: Attach the iam:PassRole policy to your Automation role.

Invalid assume role

When you run an Automation, an assume role is either provided in the runbook or passed as a parameter value for the runbook. Different types of errors can occur if the assume role isn't specified or configured properly.

Malformed Assume Role

Error message: The format of the supplied assume role ARN isn't valid. The assume role is improperly formatted. To resolve this issue, verify that a valid assume role is specified in your runbook or as a runtime parameter when starting the automation.

Assume Role Can't Be Assumed

Error message: The defined assume role is unable to be assumed. (Service: AWSSimpleSystemsManagement; Status Code: 400; Error Code: InvalidAutomationExecutionParametersException; Request ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

  • Possible cause 1: The assume role doesn't exist. To resolve this issue, create the role. For more information, see Setting up Automation. Specific details for creating this role are described in the following topic, Task 1: Create a service role for Automation.

  • Possible cause 2: The assume role doesn't have a trust relationship with the Systems Manager service. To resolve this issue, create the trust relationship. For more information, see I Can't Assume A Role in the IAM User Guide.

Execution started, but status is failed

Action-specific failures

Runbooks contain steps and steps run in order. Each step invokes one or more AWS service APIs. The APIs determine the inputs, behavior, and outputs of the step. There are multiple places where an error can cause a step to fail. Failure messages indicate when and where an error occurred.

To see a failure message in the Amazon Elastic Compute Cloud (Amazon EC2) console, choose the View Outputs link of the failed step. To see a failure message from the AWS CLI, call get-automation-execution and look for the FailureMessage attribute in a failed StepExecution.

In the following examples, a step associated with the aws:runInstance action failed. Each example explores a different type of error.

Missing Image

Error message: Automation Step Execution fails when it's launching the instance(s). Get Exception from RunInstances API of ec2 Service. Exception Message from RunInstances API: [The image id '[ami id]' doesn't exist (Service: AmazonEC2; Status Code: 400; Error Code: InvalidAMIID.NotFound; Request ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)]. Please refer to Automation Service Troubleshooting Guide for more diagnosis details.

The aws:runInstances action received input for an ImageId that doesn't exist. To resolve this problem, update the runbook or parameter values with the correct AMI ID.

Assume Role Policy Doesn't Have Sufficient Permissions

Error message: Automation Step Execution fails when it's launching the instance(s). Get Exception from RunInstances API of ec2 Service. Exception Message from RunInstances API: [You aren't authorized to perform this operation. Encoded authorization failure message: xxxxxxx (Service: AmazonEC2; Status Code: 403; Error Code: UnauthorizedOperation; Request ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)]. Please refer to Automation Service Troubleshooting Guide for more diagnosis details.

The assume role doesn't have sufficient permission to invoke the RunInstances API on EC2 instances. To resolve this problem, attach an IAM policy to the assume role that has permission to invoke the RunInstances API. For more information, see the Method 2: Use IAM to configure roles for Automation.

Unexpected State

Error message: Step fails when it's verifying launched instance(s) are ready to be used. Instance i-xxxxxxxxx entered unexpected state: shutting-down. Please refer to Automation Service Troubleshooting Guide for more diagnosis details.

  • Possible cause 1: There is a problem with the instance or the Amazon EC2 service. To resolve this problem, login to the instance or review the instance system log to understand why the instance started shutting down.

  • Possible cause 2: The user data script specified for the aws:runInstances action has a problem or incorrect syntax. Verify the syntax of the user data script. Also, verify that the user data scripts doesn't shut down the instance, or invoke other scripts that shut down the instance.

Action-Specific Failures Reference

When a step fails, the failure message might indicate which service was being invoked when the failure occurred. The following table lists the services invoked by each action. The table also provides links to information about each service.

Action AWS services invoked by this action For information about this service Troubleshooting content

aws:runInstances

Amazon EC2

Amazon EC2 User Guide for Linux Instances

Troubleshooting EC2 Instances

aws:changeInstanceState

Amazon EC2

Amazon EC2 User Guide for Linux Instances

Troubleshooting EC2 instances

aws:runCommand

Systems Manager

AWS Systems Manager Run Command

Troubleshooting Systems Manager Run Command

aws:createImage

Amazon EC2

Amazon Machine Images

aws:createStack

AWS CloudFormation

AWS CloudFormation User Guide

Troubleshooting AWS CloudFormation

aws:deleteStack

AWS CloudFormation

AWS CloudFormation User Guide

Troubleshooting AWS CloudFormation

aws:deleteImage

Amazon EC2

Amazon Machines Images

aws:copyImage

Amazon EC2

Amazon Machine Images

aws:createTag

Amazon EC2, Systems Manager

EC2 Resource and Tags

aws:invokeLambdaFunction

AWS Lambda

AWS Lambda Developer Guide

Troubleshooting Lambda

Automation service internal error

Error message: Internal Server Error. Please refer to Automation Service Troubleshooting Guide for more diagnosis details.

A problem with the Automation service is preventing the specified runbook from running correctly. To resolve this issue, contact AWS Support. Provide the execution ID and customer ID, if available.

Execution started, but timed out

Error message: Step timed out while step is verifying launched instance(s) are ready to be used. Please refer to Automation Service Troubleshooting Guide for more diagnosis details.

A step in the aws:runInstances action timed out. This can happen if the step action takes longer to run than the value specified for timeoutSeconds in the step. To resolve this issue, specify a longer value for the timeoutSeconds parameter in the aws:runInstances action. If that doesn't solve the problem, investigate why the step takes longer to run than expected