Use Systems Manager SSM documents with AWS FIS - AWS Fault Injection Service

Use Systems Manager SSM documents with AWS FIS

AWS FIS supports custom fault types through the AWS Systems Manager SSM Agent and the AWS FIS action aws:ssm:send-command. Pre-configured Systems Manager SSM documents (SSM documents) that can be used to create common fault injection actions are available as public AWS documents that begin with the AWSFIS- prefix.

SSM Agent is Amazon software that can be installed and configured on Amazon EC2 instances, on-premises servers, or virtual machines (VMs). This makes it possible for Systems Manager to manage these resources. The agent processes requests from Systems Manager, and then runs them as specified in the request. You can include your own SSM document to inject custom faults, or reference one of the public Amazon-owned documents.

Requirements

For actions that require SSM Agent to run the action on the target, you must ensure the following:

Use the aws:ssm:send-command action

An SSM document defines the actions that Systems Manager performs on your managed instances. Systems Manager includes a number of pre-configured documents, or you can create your own. For more information about creating your own SSM document, see Creating Systems Manager documents in the AWS Systems Manager User Guide. For more information about SSM documents in general, see AWS Systems Manager documents in the AWS Systems Manager User Guide.

AWS FIS provides pre-configured SSM documents. You can view the pre-configured SSM documents under Documents in the AWS Systems Manager console: https://console.aws.amazon.com/systems-manager/documents. You can also choose from a selection of pre-configured documents in the AWS FIS console. For more information, see Pre-configured AWS FIS SSM documents.

To use an SSM document in your AWS FIS experiments, you can use the aws:ssm:send-command action. This action fetches and runs the specified SSM document on your target instances.

When you use the aws:ssm:send-command action in your experiment template, you must specify additional parameters for the action, including the following:

  • documentArn – Required. The Amazon Resource Name (ARN) of the SSM document.

  • documentParameters – Conditional. The required and optional parameters that the SSM document accepts. The format is a JSON object with keys that are strings and values that are either strings or arrays of strings.

  • documentVersion – Optional. The version of the SSM document to run.

You can view the information for an SSM document (including the parameters for the document) by using the Systems Manager console or the command line.

To view information about an SSM document using the console
  1. Open the AWS Systems Manager console at https://console.aws.amazon.com/systems-manager/.

  2. In the navigation pane, choose Documents.

  3. Select the document, and choose the Details tab.

To view information about an SSM document using the command line

Use the SSM describe-document command.

Pre-configured AWS FIS SSM documents

You can use pre-configured AWS FIS SSM documents with the aws:ssm:send-command action in your experiment templates.

Requirements
  • The pre-configured SSM documents provided by AWS FIS are supported only on the following operating systems:

    • Amazon Linux 2023, Amazon Linux 2, Amazon Linux

    • Ubuntu

    • RHEL 7, 8, 9

    • CentOS 8, 9

  • The pre-configured SSM documents provided by AWS FIS are supported only on EC2 instances. They are not supported on other types of managed nodes, such as on-premises servers.

To use these SSM documents in experiments on ECS tasks, use the corresponding Amazon ECS actions. For example, the aws:ecs:task-cpu-stress action uses the AWSFIS-Run-CPU-Stress document.

Difference between action duration and DurationSeconds in AWS FIS SSM documents

Some SSM documents limit their own execution time, for example the DurationSeconds parameter is used by some of the pre-configured AWS FIS SSM documents. As a result you need to specify two independent durations in the AWS FIS action definition:

  • Action duration: For experiments with a single action, the action duration is equivalent to the experiment duration. With multiple actions, the experiment duration depends on individual action durations and the order in which they are run. AWS FIS monitors each action until its action duration elapsed.

  • Document parameter DurationSeconds: The duration, specified in seconds, for which the SSM document will execute.

You can choose different values for the two types of duration:

  • Action duration exceeds DurationSeconds: The SSM document execution finishes before the action is complete. AWS FIS waits until the action duration elapsed before subsequent actions are started.

  • Action duration is shorter than DurationSeconds: The SSM document continues the execution after the action is complete. If the SSM document execution is still in progress and the action duration elapsed then the action status is set to Completed. AWS FIS only monitors the execution until the action duration elapsed.

Note that some SSM documents have variable durations. For example AWS FIS SSM documents have the option to install prerequisites, which can extend the overall execution duration beyond the specified DurationSeconds parameter. Thus, if you set the action duration and DurationSeconds to the same value, it is possible that the SSM script may run longer than the action duration.

AWSFIS-Run-CPU-Stress

Runs CPU stress on an instance using the stress-ng tool. Uses the AWSFIS-Run-CPU-Stress SSM document.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-CPU-Stress

ARN

arn:aws:ssm:region::document/AWSFIS-Run-CPU-Stress

Document parameters
  • DurationSeconds – Required. The duration of the CPU stress test, in seconds.

  • CPU – Optional. The number of CPU stressors to use. The default is 0, which uses all CPU stressors.

  • LoadPercent – Optional. The target CPU load percentage, from 0 (no load) to 100 (full load). The default is 100.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default is True. The dependency is stress-ng.

The following is an example of the string you can enter in the console.

{"DurationSeconds":"60", "InstallDependencies":"True"}

AWSFIS-Run-Disk-Fill

Allocates disk space on the root volume of an instance to simulate a disk full fault. Uses the AWSFIS-Run-Disk-Fill SSM document.

If the experiment injecting this fault is stopped, either manually or through a stop condition, AWS FIS attempts to roll back by canceling the running SSM document. However, if the disk is 100% full, either due to the fault or the fault plus application activity, Systems Manager might be unable to complete the cancel operation. Therefore, if you might need to stop the experiment, ensure that the disk will not become 100% full.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-Disk-Fill

ARN

arn:aws:ssm:region::document/AWSFIS-Run-Disk-Fill

Document parameters
  • DurationSeconds – Required. The duration of the disk fill test, in seconds.

  • Percent – Optional. The percentage of the disk to allocate during the disk fill test. The default is 95%.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default is True. The dependencies are atd and fallocate.

The following is an example of the string you can enter in the console.

{"DurationSeconds":"60", "InstallDependencies":"True"}

AWSFIS-Run-IO-Stress

Runs IO stress on an instance using the stress-ng tool. Uses the AWSFIS-Run-IO-Stress SSM document.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-IO-Stress

ARN

arn:aws:ssm:region::document/AWSFIS-Run-IO-Stress

Document parameters
  • DurationSeconds – Required. The duration of the IO stress test, in seconds.

  • Workers – Optional. The number of workers that perform a mix of sequential, random, and memory-mapped read/write operations, forced synchronizing, and cache dropping. Multiple child processes perform different I/O operations on the same file. The default is 1.

  • Percent – Optional. The percentage of free space on the file system to use during the IO stress test. The default is 80%.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default is True. The dependency is stress-ng.

The following is an example of the string you can enter in the console.

{"Workers":"1", "Percent":"80", "DurationSeconds":"60", "InstallDependencies":"True"}

AWSFIS-Run-Kill-Process

Stops the specified process in the instance, using the killall command. Uses the AWSFIS-Run-Kill-Process SSM document.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-Kill-Process

ARN

arn:aws:ssm:region::document/AWSFIS-Run-Kill-Process

Document parameters
  • ProcessName – Required. The name of the process to stop.

  • Signal – Optional. The signal to send along with the command. The possible values are SIGTERM (which the receiver can choose to ignore) and SIGKILL (which cannot be ignored). The default is SIGTERM.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default is True. The dependency is killall.

The following is an example of the string you can enter in the console.

{"ProcessName":"myapplication", "Signal":"SIGTERM"}

AWSFIS-Run-Memory-Stress

Runs memory stress on an instance using the stress-ng tool. Uses the AWSFIS-Run-Memory-Stress SSM document.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-Memory-Stress

ARN

arn:aws:ssm:region::document/AWSFIS-Run-Memory-Stress

Document parameters
  • DurationSeconds – Required. The duration of the memory stress test, in seconds.

  • Workers – Optional. The number of virtual memory stressors. The default is 1.

  • Percent – Required. The percentage of virtual memory to use during the memory stress test.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default is True. The dependency is stress-ng.

The following is an example of the string you can enter in the console.

{"Percent":"80", "DurationSeconds":"60", "InstallDependencies":"True"}

AWSFIS-Run-Network-Blackhole-Port

Drops inbound or outbound traffic for the protocol and port using the iptables tool. Uses the AWSFIS-Run-Network-Blackhole-Port SSM document.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-Network-Blackhole-Port

ARN

arn:aws:ssm:region::document/AWSFIS-Run-Network-Blackhole-Port

Document parameters
  • Protocol – Required. The protocol. The possible values are tcp and udp.

  • Port – Required. The port number.

  • TrafficType – Optional. The type of traffic. The possible values are ingress and egress. The default is ingress.

  • DurationSeconds – Required. The duration of the network blackhole test, in seconds.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default is True. The dependencies are atd, dig, and iptables.

The following is an example of the string you can enter in the console.

{"Protocol":"tcp", "Port":"8080", "TrafficType":"egress", "DurationSeconds":"60", "InstallDependencies":"True"}

AWSFIS-Run-Network-Latency

Adds latency to the network interface using the tc tool. Uses the AWSFIS-Run-Network-Latency SSM document.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-Network-Latency

ARN

arn:aws:ssm:region::document/AWSFIS-Run-Network-Latency

Document parameters
  • Interface – Optional. The network interface. The default is eth0.

  • DelayMilliseconds – Optional. The delay, in milliseconds. The default is 200.

  • DurationSeconds – Required. The duration of the network latency test, in seconds.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default is True. The dependencies are atd, dig, and tc.

The following is an example of the string you can enter in the console.

{"DelayMilliseconds":"200", "Interface":"eth0", "DurationSeconds":"60", "InstallDependencies":"True"}

AWSFIS-Run-Network-Latency-Sources

Adds latency and jitter to the network interface using the tc tool for traffic to or from specific sources. Uses the AWSFIS-Run-Network-Latency-Sources SSM document.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-Network-Latency-Sources

ARN

arn:aws:ssm:region::document/AWSFIS-Run-Network-Latency-Sources

Document parameters
  • Interface – Optional. The network interface. The default is eth0.

  • DelayMilliseconds – Optional. The delay, in milliseconds. The default is 200.

  • JitterMilliseconds – Optional. The jitter, in milliseconds. The default is 10.

  • Sources – Required. The sources, separated by commas. The possible values are: an IPv4 address, an IPv4 CIDR block, a domain name, DYNAMODB, and S3. If you specify DYNAMODB or S3, this applies only to the Regional endpoint in the current Region.

  • TrafficType – Optional. The type of traffic. The possible values are ingress and egress. The default is ingress.

  • DurationSeconds – Required. The duration of the network latency test, in seconds.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances if they are not already installed. The default is True. The dependencies are atd, dig, jq, and tc.

The following is an example of the string you can enter in the console.

{"DelayMilliseconds":"200", "JitterMilliseconds":"15", "Sources":"S3,www.example.com,72.21.198.67", "Interface":"eth0", "TrafficType":"egress", "DurationSeconds":"60", "InstallDependencies":"True"}

AWSFIS-Run-Network-Packet-Loss

Adds packet loss to the network interface using the tc tool. Uses the AWSFIS-Run-Network-Packet-Loss SSM document.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-Network-Packet-Loss

ARN

arn:aws:ssm:region::document/AWSFIS-Run-Network-Packet-Loss

Document parameters
  • Interface – Optional. The network interface. The default is eth0.

  • LossPercent – Optional. The percentage of packet loss. The default is 7%.

  • DurationSeconds – Required. The duration of the network packet loss test, in seconds.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances. The default is True. The dependencies are atd, dig, and tc.

The following is an example of the string you can enter in the console.

{"LossPercent":"15", "Interface":"eth0", "DurationSeconds":"60", "InstallDependencies":"True"}

AWSFIS-Run-Network-Packet-Loss-Sources

Adds packet loss to the network interface using the tc tool for traffic to or from specific sources. Uses the AWSFIS-Run-Network-Packet-Loss-Sources SSM document.

Action type (console only)

aws:ssm:send-command/AWSFIS-Run-Network-Packet-Loss-Sources

ARN

arn:aws:ssm:region::document/AWSFIS-Run-Network-Packet-Loss-Sources

Document parameters
  • Interface – Optional. The network interface. The default is eth0.

  • LossPercent – Optional. The percentage of packet loss. The default is 7%.

  • Sources – Required. The sources, separated by commas. The possible values are: an IPv4 address, an IPv4 CIDR block, a domain name, DYNAMODB, and S3. If you specify DYNAMODB or S3, this applies only to the Regional endpoint in the current Region.

  • TrafficType – Optional. The type of traffic. The possible values are ingress and egress. The default is ingress.

  • DurationSeconds – Required. The duration of the network packet loss test, in seconds.

  • InstallDependencies – Optional. If the value is True, Systems Manager installs the required dependencies on the target instances. The default is True. The dependencies are atd, dig, jq, and tc.

The following is an example of the string you can enter in the console.

{"LossPercent":"15", "Sources":"S3,www.example.com,72.21.198.67", "Interface":"eth0", "TrafficType":"egress", "DurationSeconds":"60", "InstallDependencies":"True"}

Examples

For an example experiment template, see Run a pre-configured AWS FIS SSM document.

For an example tutorial, see Run CPU stress on an instance.

Troubleshooting

Use the following procedure to troubleshoot issues.

To troubleshoot issues with SSM documents
  1. Open the AWS Systems Manager console at https://console.aws.amazon.com/systems-manager/.

  2. In the navigation pane, choose Node Management, Run Command.

  3. On the Command history tab, use the filters to locate the run of the document.

  4. Choose the ID of the command to open its details page.

  5. Choose the ID of the instance. Review the output and errors for each step.