Use Systems Manager SSM documents with AWS FIS
AWS FIS supports custom fault types through the AWS Systems Manager SSM Agent and the AWS FIS action aws:ssm:send-command. Pre-configured Systems Manager SSM documents (SSM documents) that can be used to create common fault injection actions are available as public AWS documents that begin with the AWSFIS- prefix.
SSM Agent is Amazon software that can be installed and configured on Amazon EC2 instances, on-premises servers, or virtual machines (VMs). This makes it possible for Systems Manager to manage these resources. The agent processes requests from Systems Manager, and then runs them as specified in the request. You can include your own SSM document to inject custom faults, or reference one of the public Amazon-owned documents.
Requirements
For actions that require SSM Agent to run the action on the target, you must ensure the following:
-
The agent is installed on the target. SSM Agent is installed by default on some Amazon Machine Images (AMIs). Otherwise, you can install the SSM Agent on your instances. For more information, see Manually install SSM Agent for EC2 instances in the AWS Systems Manager User Guide.
-
Systems Manager has permission to perform actions on your instances. You grant access using an IAM instance profile. For more information, see Create an IAM instance profile for Systems Manager and Attach an IAM instance profile to an EC2 instance in the AWS Systems Manager User Guide.
Use the aws:ssm:send-command action
An SSM document defines the actions that Systems Manager performs on your managed instances. Systems Manager includes a number of pre-configured documents, or you can create your own. For more information about creating your own SSM document, see Creating Systems Manager documents in the AWS Systems Manager User Guide. For more information about SSM documents in general, see AWS Systems Manager documents in the AWS Systems Manager User Guide.
AWS FIS provides pre-configured SSM documents. You can view the pre-configured SSM
documents under Documents in the AWS Systems Manager console: https://console.aws.amazon.com/systems-manager/documents
To use an SSM document in your AWS FIS experiments, you can use the aws:ssm:send-command action. This action fetches and runs the specified SSM document on your target instances.
When you use the aws:ssm:send-command
action in your experiment template,
you must specify additional parameters for the action, including the following:
-
documentArn – Required. The Amazon Resource Name (ARN) of the SSM document.
-
documentParameters – Conditional. The required and optional parameters that the SSM document accepts. The format is a JSON object with keys that are strings and values that are either strings or arrays of strings.
-
documentVersion – Optional. The version of the SSM document to run.
You can view the information for an SSM document (including the parameters for the document) by using the Systems Manager console or the command line.
To view information about an SSM document using the console
-
Open the AWS Systems Manager console at https://console.aws.amazon.com/systems-manager/
. -
In the navigation pane, choose Documents.
-
Select the document, and choose the Details tab.
To view information about an SSM document using the command line
Use the SSM describe-document command.
Pre-configured AWS FIS SSM documents
You can use pre-configured AWS FIS SSM documents with the
aws:ssm:send-command
action in your experiment templates.
Requirements
-
The pre-configured SSM documents provided by AWS FIS are supported only on the following operating systems:
Amazon Linux 2023, Amazon Linux 2, Amazon Linux
Ubuntu
RHEL 7, 8, 9
CentOS 8, 9
-
The pre-configured SSM documents provided by AWS FIS are supported only on EC2 instances. They are not supported on other types of managed nodes, such as on-premises servers.
To use these SSM documents in experiments on ECS tasks, use the corresponding Amazon ECS actions. For example, the aws:ecs:task-cpu-stress action uses the AWSFIS-Run-CPU-Stress document.
Documents
Difference between action duration and DurationSeconds in AWS FIS SSM documents
Some SSM documents limit their own execution time, for example the DurationSeconds parameter is used by some of the pre-configured AWS FIS SSM documents. As a result you need to specify two independent durations in the AWS FIS action definition:
-
Action duration: For experiments with a single action, the action duration is equivalent to the experiment duration. With multiple actions, the experiment duration depends on individual action durations and the order in which they are run. AWS FIS monitors each action until its action duration elapsed.
-
Document parameter DurationSeconds: The duration, specified in seconds, for which the SSM document will execute.
You can choose different values for the two types of duration:
-
Action duration exceeds DurationSeconds: The SSM document execution finishes before the action is complete. AWS FIS waits until the action duration elapsed before subsequent actions are started.
-
Action duration is shorter than DurationSeconds: The SSM document continues the execution after the action is complete. If the SSM document execution is still in progress and the action duration elapsed then the action status is set to Completed. AWS FIS only monitors the execution until the action duration elapsed.
Note that some SSM documents have variable durations. For example AWS FIS SSM documents have the option to install prerequisites, which can extend the overall execution duration beyond the specified DurationSeconds parameter. Thus, if you set the action duration and DurationSeconds to the same value, it is possible that the SSM script may run longer than the action duration.
AWSFIS-Run-CPU-Stress
Runs CPU stress on an instance using the stress-ng tool. Uses
the AWSFIS-Run-CPU-Stress
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-CPU-Stress
ARN
arn:aws:ssm:region::document/AWSFIS-Run-CPU-Stress
Document parameters
-
DurationSeconds – Required. The duration of the CPU stress test, in seconds.
-
CPU – Optional. The number of CPU stressors to use. The default is 0, which uses all CPU stressors.
-
LoadPercent – Optional. The target CPU load percentage, from 0 (no load) to 100 (full load). The default is 100.
-
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default isTrue
. The dependency is stress-ng.
The following is an example of the string you can enter in the console.
{"DurationSeconds":"60", "InstallDependencies":"True"}
AWSFIS-Run-Disk-Fill
Allocates disk space on the root volume of an instance to simulate a disk full
fault. Uses the AWSFIS-Run-Disk-Fill
If the experiment injecting this fault is stopped, either manually or through a stop condition, AWS FIS attempts to roll back by canceling the running SSM document. However, if the disk is 100% full, either due to the fault or the fault plus application activity, Systems Manager might be unable to complete the cancel operation. Therefore, if you might need to stop the experiment, ensure that the disk will not become 100% full.
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-Disk-Fill
ARN
arn:aws:ssm:region::document/AWSFIS-Run-Disk-Fill
Document parameters
-
DurationSeconds – Required. The duration of the disk fill test, in seconds.
-
Percent – Optional. The percentage of the disk to allocate during the disk fill test. The default is 95%.
-
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default isTrue
. The dependencies are atd and fallocate.
The following is an example of the string you can enter in the console.
{"DurationSeconds":"60", "InstallDependencies":"True"}
AWSFIS-Run-IO-Stress
Runs IO stress on an instance using the stress-ng tool. Uses
the AWSFIS-Run-IO-Stress
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-IO-Stress
ARN
arn:aws:ssm:region::document/AWSFIS-Run-IO-Stress
Document parameters
-
DurationSeconds – Required. The duration of the IO stress test, in seconds.
-
Workers – Optional. The number of workers that perform a mix of sequential, random, and memory-mapped read/write operations, forced synchronizing, and cache dropping. Multiple child processes perform different I/O operations on the same file. The default is 1.
-
Percent – Optional. The percentage of free space on the file system to use during the IO stress test. The default is 80%.
-
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default isTrue
. The dependency is stress-ng.
The following is an example of the string you can enter in the console.
{"Workers":"1", "Percent":"80", "DurationSeconds":"60", "InstallDependencies":"True"}
AWSFIS-Run-Kill-Process
Stops the specified process in the instance, using the killall
command. Uses the AWSFIS-Run-Kill-Process
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-Kill-Process
ARN
arn:aws:ssm:region::document/AWSFIS-Run-Kill-Process
Document parameters
-
ProcessName – Required. The name of the process to stop.
-
Signal – Optional. The signal to send along with the command. The possible values are
SIGTERM
(which the receiver can choose to ignore) andSIGKILL
(which cannot be ignored). The default isSIGTERM
. -
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default isTrue
. The dependency is killall.
The following is an example of the string you can enter in the console.
{"ProcessName":"myapplication", "Signal":"SIGTERM"}
AWSFIS-Run-Memory-Stress
Runs memory stress on an instance using the stress-ng tool.
Uses the AWSFIS-Run-Memory-Stress
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-Memory-Stress
ARN
arn:aws:ssm:region::document/AWSFIS-Run-Memory-Stress
Document parameters
-
DurationSeconds – Required. The duration of the memory stress test, in seconds.
-
Workers – Optional. The number of virtual memory stressors. The default is 1.
-
Percent – Required. The percentage of virtual memory to use during the memory stress test.
-
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default isTrue
. The dependency is stress-ng.
The following is an example of the string you can enter in the console.
{"Percent":"80", "DurationSeconds":"60", "InstallDependencies":"True"}
AWSFIS-Run-Network-Blackhole-Port
Drops inbound or outbound traffic for the protocol and port using the
iptables tool. Uses the AWSFIS-Run-Network-Blackhole-Port
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-Network-Blackhole-Port
ARN
arn:aws:ssm:region::document/AWSFIS-Run-Network-Blackhole-Port
Document parameters
-
Protocol – Required. The protocol. The possible values are
tcp
andudp
. -
Port – Required. The port number.
-
TrafficType – Optional. The type of traffic. The possible values are
ingress
andegress
. The default isingress
. -
DurationSeconds – Required. The duration of the network blackhole test, in seconds.
-
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default isTrue
. The dependencies are atd, dig, and iptables.
The following is an example of the string you can enter in the console.
{"Protocol":"tcp", "Port":"8080", "TrafficType":"egress", "DurationSeconds":"60", "InstallDependencies":"True"}
AWSFIS-Run-Network-Latency
Adds latency to the network interface using the tc tool. Uses
the AWSFIS-Run-Network-Latency
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-Network-Latency
ARN
arn:aws:ssm:region::document/AWSFIS-Run-Network-Latency
Document parameters
-
Interface – Optional. The network interface. The default is
eth0
. -
DelayMilliseconds – Optional. The delay, in milliseconds. The default is 200.
-
DurationSeconds – Required. The duration of the network latency test, in seconds.
-
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default isTrue
. The dependencies are atd, dig, and tc.
The following is an example of the string you can enter in the console.
{"DelayMilliseconds":"200", "Interface":"eth0", "DurationSeconds":"60", "InstallDependencies":"True"}
AWSFIS-Run-Network-Latency-Sources
Adds latency and jitter to the network interface using the tc
tool for traffic to or from specific sources. Uses the AWSFIS-Run-Network-Latency-Sources
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-Network-Latency-Sources
ARN
arn:aws:ssm:region::document/AWSFIS-Run-Network-Latency-Sources
Document parameters
-
Interface – Optional. The network interface. The default is
eth0
. -
DelayMilliseconds – Optional. The delay, in milliseconds. The default is 200.
-
JitterMilliseconds – Optional. The jitter, in milliseconds. The default is 10.
-
Sources – Required. The sources, separated by commas. The possible values are: an IPv4 address, an IPv4 CIDR block, a domain name,
DYNAMODB
, andS3
. If you specifyDYNAMODB
orS3
, this applies only to the Regional endpoint in the current Region. -
TrafficType – Optional. The type of traffic. The possible values are
ingress
andegress
. The default isingress
. -
DurationSeconds – Required. The duration of the network latency test, in seconds.
-
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default isTrue
. The dependencies are atd, dig, jq, and tc.
The following is an example of the string you can enter in the console.
{"DelayMilliseconds":"200", "JitterMilliseconds":"15", "Sources":"S3,www.example.com,72.21.198.67", "Interface":"eth0", "TrafficType":"egress", "DurationSeconds":"60", "InstallDependencies":"True"}
AWSFIS-Run-Network-Packet-Loss
Adds packet loss to the network interface using the tc tool.
Uses the AWSFIS-Run-Network-Packet-Loss
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-Network-Packet-Loss
ARN
arn:aws:ssm:region::document/AWSFIS-Run-Network-Packet-Loss
Document parameters
-
Interface – Optional. The network interface. The default is
eth0
. -
LossPercent – Optional. The percentage of packet loss. The default is 7%.
-
DurationSeconds – Required. The duration of the network packet loss test, in seconds.
-
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances. The default isTrue
. The dependencies are atd, dig, and tc.
The following is an example of the string you can enter in the console.
{"LossPercent":"15", "Interface":"eth0", "DurationSeconds":"60", "InstallDependencies":"True"}
AWSFIS-Run-Network-Packet-Loss-Sources
Adds packet loss to the network interface using the tc tool for
traffic to or from specific sources. Uses the AWSFIS-Run-Network-Packet-Loss-Sources
Action type (console only)
aws:ssm:send-command/AWSFIS-Run-Network-Packet-Loss-Sources
ARN
arn:aws:ssm:region::document/AWSFIS-Run-Network-Packet-Loss-Sources
Document parameters
-
Interface – Optional. The network interface. The default is
eth0
. -
LossPercent – Optional. The percentage of packet loss. The default is 7%.
-
Sources – Required. The sources, separated by commas. The possible values are: an IPv4 address, an IPv4 CIDR block, a domain name,
DYNAMODB
, andS3
. If you specifyDYNAMODB
orS3
, this applies only to the Regional endpoint in the current Region. -
TrafficType – Optional. The type of traffic. The possible values are
ingress
andegress
. The default isingress
. -
DurationSeconds – Required. The duration of the network packet loss test, in seconds.
-
InstallDependencies – Optional. If the value is
True
, Systems Manager installs the required dependencies on the target instances. The default isTrue
. The dependencies are atd, dig, jq, and tc.
The following is an example of the string you can enter in the console.
{"LossPercent":"15", "Sources":"S3,www.example.com,72.21.198.67", "Interface":"eth0", "TrafficType":"egress", "DurationSeconds":"60", "InstallDependencies":"True"}
Examples
For an example experiment template, see Run a pre-configured AWS FIS SSM document.
For an example tutorial, see Run CPU stress on an instance.
Troubleshooting
Use the following procedure to troubleshoot issues.
To troubleshoot issues with SSM documents
Open the AWS Systems Manager console at https://console.aws.amazon.com/systems-manager/
. -
In the navigation pane, choose Node Management, Run Command.
-
On the Command history tab, use the filters to locate the run of the document.
-
Choose the ID of the command to open its details page.
-
Choose the ID of the instance. Review the output and errors for each step.