使用 AWS FIS aws: ecs: task 操作 - AWS 故障注入服务

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

使用 AWS FIS aws: ecs: task 操作

您可以执行 aws:ecs:task 操作,为 Amazon ECS 任务注入故障。

这些操作使用 SSM 代理作为 Sidecar 容器来运行 SSM 文档,这些文档将执行故障注入,并通过 Sidecar 容器将 Amazon ECS 任务注册为 SSM 托管实例。要使用这些操作,您需要更新 Amazon ECS 任务定义,将 SSM 代理添加为 Sidecar 容器,以便将其运行的任务注册为 SSM 托管实例。当您运行 AWS FIS 实验定位时aws:ecs:task, AWS FIS 会使用添加到托管实例的资源标签将您在 AWS FIS 实验模板上指定的目标 Amazon ECS 任务映射到一组 SSM 托管实例。ECS_TASK_ARN标签值是应执行 SSM 文档的关联 Amazon ECS 任务的 ARN,因此在运行实验时不应删除。

操作

限制

  • 以下操作不适用于 AWS Fargate:

    • aws:ecs:task-kill-process

    • aws:ecs:task-network-blackhole-port

    • aws:ecs:task-network-latency

    • aws:ecs:task-network-packet-loss

  • 必须先禁用已启用的 ECS Exec,才能执行此类操作。

要求

  • 向 AWS FIS 实验角色添加以下权限:

    • ssm:SendCommand

    • ssm:ListCommands

    • ssm:CancelCommand

  • 为 Amazon ECS 的任务 IAM 角色添加以下权限:

    • ssm:CreateActivation

    • ssm:AddTagsToResource

    • iam:PassRole

    请注意,您可以将托管实例角色的 ARN 指定为 iam:PassRole 资源。

  • 创建 Amazon ECS 任务执行 IAM 角色并添加 A mazonECS TaskExecution RolePolicy 托管策略。

  • 为附加到注册为托管实例的任务上的托管实例角色添加以下权限:

    • ssm:DeleteActivation

    • ssm:DeregisterManagedInstance

  • AmazonSSM ManagedInstance Core 托管策略添加到注册为托管实例的任务所关联的托管实例角色中。

  • 将环境变量 MANAGED_INSTANCE_ROLE_NAME 设置为托管实例角色的名称。

  • 为 ECS 任务定义添加 SSM 代理容器。命令脚本将 ECS 任务注册为托管实例。

    { "name": "amazon-ssm-agent", "image": "public.ecr.aws/amazon-ssm-agent/amazon-ssm-agent:latest", "cpu": 0, "links": [], "portMappings": [], "essential": false, "entryPoint": [], "command": [ "/bin/bash", "-c", "set -e; yum upgrade -y; yum install jq procps awscli -y; term_handler() { echo \"Deleting SSM activation $ACTIVATION_ID\"; if ! aws ssm delete-activation --activation-id $ACTIVATION_ID --region $ECS_TASK_REGION; then echo \"SSM activation $ACTIVATION_ID failed to be deleted\" 1>&2; fi; MANAGED_INSTANCE_ID=$(jq -e -r .ManagedInstanceID /var/lib/amazon/ssm/registration); echo \"Deregistering SSM Managed Instance $MANAGED_INSTANCE_ID\"; if ! aws ssm deregister-managed-instance --instance-id $MANAGED_INSTANCE_ID --region $ECS_TASK_REGION; then echo \"SSM Managed Instance $MANAGED_INSTANCE_ID failed to be deregistered\" 1>&2; fi; kill -SIGTERM $SSM_AGENT_PID; }; trap term_handler SIGTERM SIGINT; if [[ -z $MANAGED_INSTANCE_ROLE_NAME ]]; then echo \"Environment variable MANAGED_INSTANCE_ROLE_NAME not set, exiting\" 1>&2; exit 1; fi; if ! ps ax | grep amazon-ssm-agent | grep -v grep > /dev/null; then if [[ -n $ECS_CONTAINER_METADATA_URI_V4 ]] ; then echo \"Found ECS Container Metadata, running activation with metadata\"; TASK_METADATA=$(curl \"${ECS_CONTAINER_METADATA_URI_V4}/task\"); ECS_TASK_AVAILABILITY_ZONE=$(echo $TASK_METADATA | jq -e -r '.AvailabilityZone'); ECS_TASK_ARN=$(echo $TASK_METADATA | jq -e -r '.TaskARN'); ECS_TASK_REGION=$(echo $ECS_TASK_AVAILABILITY_ZONE | sed 's/.$//'); ECS_TASK_AVAILABILITY_ZONE_REGEX='^(af|ap|ca|cn|eu|me|sa|us|us-gov)-(central|north|(north(east|west))|south|south(east|west)|east|west)-[0-9]{1}[a-z]{1}$'; if ! [[ $ECS_TASK_AVAILABILITY_ZONE =~ $ECS_TASK_AVAILABILITY_ZONE_REGEX ]]; then echo \"Error extracting Availability Zone from ECS Container Metadata, exiting\" 1>&2; exit 1; fi; ECS_TASK_ARN_REGEX='^arn:(aws|aws-cn|aws-us-gov):ecs:[a-z0-9-]+:[0-9]{12}:task/[a-zA-Z0-9_-]+/[a-zA-Z0-9]+$'; if ! [[ $ECS_TASK_ARN =~ $ECS_TASK_ARN_REGEX ]]; then echo \"Error extracting Task ARN from ECS Container Metadata, exiting\" 1>&2; exit 1; fi; CREATE_ACTIVATION_OUTPUT=$(aws ssm create-activation --iam-role $MANAGED_INSTANCE_ROLE_NAME --tags Key=ECS_TASK_AVAILABILITY_ZONE,Value=$ECS_TASK_AVAILABILITY_ZONE Key=ECS_TASK_ARN,Value=$ECS_TASK_ARN Key=FAULT_INJECTION_SIDECAR,Value=true --region $ECS_TASK_REGION); ACTIVATION_CODE=$(echo $CREATE_ACTIVATION_OUTPUT | jq -e -r .ActivationCode); ACTIVATION_ID=$(echo $CREATE_ACTIVATION_OUTPUT | jq -e -r .ActivationId); if ! amazon-ssm-agent -register -code $ACTIVATION_CODE -id $ACTIVATION_ID -region $ECS_TASK_REGION; then echo \"Failed to register with AWS Systems Manager (SSM), exiting\" 1>&2; exit 1; fi; amazon-ssm-agent & SSM_AGENT_PID=$!; wait $SSM_AGENT_PID; else echo \"ECS Container Metadata not found, exiting\" 1>&2; exit 1; fi; else echo \"SSM agent is already running, exiting\" 1>&2; exit 1; fi" ], "environment": [ { "name": "MANAGED_INSTANCE_ROLE_NAME", "value": "SSMManagedInstanceRole" } ], "environmentFiles": [], "mountPoints": [], "volumesFrom": [], "secrets": [], "dnsServers": [], "dnsSearchDomains": [], "extraHosts": [], "dockerSecurityOptions": [], "dockerLabels": {}, "ulimits": [], "logConfiguration": {}, "systemControls": [] }

    有关可读性更强的脚本版本,请参阅 脚本参考版本

  • 执行 aws:ecs:task-network-blackhole-portaws:ecs:task-network-latencyaws:ecs:task-network-packet-loss 操作时,请务必使用以下选项之一,在 ECS 任务定义中更新 SSM 代理容器。

    • 选项 1:添加特定的 Linux 功能。

      "linuxParameters": { "capabilities": { "add": [ "NET_ADMIN" ] } },
    • 选项 2:添加所有的 Linux 功能。

      "privileged": true,
  • 执行 aws:ecs:task-kill-processaws:ecs:task-network-blackhole-portaws:ecs:task-network-latencyaws:ecs:task-network-packet-loss 操作时,ECS 任务定义必须将 pidMode 设置为 task

脚本参考版本

以下是“需求”部分中更具可读性的脚本版本,供您参考。

#!/usr/bin/env bash # This is the activation script used to register ECS tasks as Managed Instances in SSM # The script retrieves information form the ECS task metadata endpoint to add three tags to the Managed Instance # - ECS_TASK_AVAILABILITY_ZONE: To allow customers to target Managed Instances / Tasks in a specific Availability Zone # - ECS_TASK_ARN: To allow customers to target Managed Instances / Tasks by using the Task ARN # - FAULT_INJECTION_SIDECAR: To make it clear that the tasks were registered as managed instance for fault injection purposes. Value is always 'true'. # The script will leave the SSM Agent running in the background # When the container running this script receives a SIGTERM or SIGINT signal, it will do the following cleanup: # - Delete SSM activation # - Deregister SSM managed instance set -e # stop execution instantly as a query exits while having a non-zero yum upgrade -y yum install jq procps awscli -y term_handler() { echo "Deleting SSM activation $ACTIVATION_ID" if ! aws ssm delete-activation --activation-id $ACTIVATION_ID --region $ECS_TASK_REGION; then echo "SSM activation $ACTIVATION_ID failed to be deleted" 1>&2 fi MANAGED_INSTANCE_ID=$(jq -e -r .ManagedInstanceID /var/lib/amazon/ssm/registration) echo "Deregistering SSM Managed Instance $MANAGED_INSTANCE_ID" if ! aws ssm deregister-managed-instance --instance-id $MANAGED_INSTANCE_ID --region $ECS_TASK_REGION; then echo "SSM Managed Instance $MANAGED_INSTANCE_ID failed to be deregistered" 1>&2 fi kill -SIGTERM $SSM_AGENT_PID } trap term_handler SIGTERM SIGINT # check if the required IAM role is provided if [[ -z $MANAGED_INSTANCE_ROLE_NAME ]] ; then echo "Environment variable MANAGED_INSTANCE_ROLE_NAME not set, exiting" 1>&2 exit 1 fi # check if the agent is already running (it will be if ECS Exec is enabled) if ! ps ax | grep amazon-ssm-agent | grep -v grep > /dev/null; then # check if ECS Container Metadata is available if [[ -n $ECS_CONTAINER_METADATA_URI_V4 ]] ; then # Retrieve info from ECS task metadata endpoint echo "Found ECS Container Metadata, running activation with metadata" TASK_METADATA=$(curl "${ECS_CONTAINER_METADATA_URI_V4}/task") ECS_TASK_AVAILABILITY_ZONE=$(echo $TASK_METADATA | jq -e -r '.AvailabilityZone') ECS_TASK_ARN=$(echo $TASK_METADATA | jq -e -r '.TaskARN') ECS_TASK_REGION=$(echo $ECS_TASK_AVAILABILITY_ZONE | sed 's/.$//') # validate ECS_TASK_AVAILABILITY_ZONE ECS_TASK_AVAILABILITY_ZONE_REGEX='^(af|ap|ca|cn|eu|me|sa|us|us-gov)-(central|north|(north(east|west))|south|south(east|west)|east|west)-[0-9]{1}[a-z]{1}$' if ! [[ $ECS_TASK_AVAILABILITY_ZONE =~ $ECS_TASK_AVAILABILITY_ZONE_REGEX ]] ; then echo "Error extracting Availability Zone from ECS Container Metadata, exiting" 1>&2 exit 1 fi # validate ECS_TASK_ARN ECS_TASK_ARN_REGEX='^arn:(aws|aws-cn|aws-us-gov):ecs:[a-z0-9-]+:[0-9]{12}:task/[a-zA-Z0-9_-]+/[a-zA-Z0-9]+$' if ! [[ $ECS_TASK_ARN =~ $ECS_TASK_ARN_REGEX ]] ; then echo "Error extracting Task ARN from ECS Container Metadata, exiting" 1>&2 exit 1 fi # Create activation tagging with Availability Zone and Task ARN CREATE_ACTIVATION_OUTPUT=$(aws ssm create-activation \ --iam-role $MANAGED_INSTANCE_ROLE_NAME \ --tags Key=ECS_TASK_AVAILABILITY_ZONE,Value=$ECS_TASK_AVAILABILITY_ZONE Key=ECS_TASK_ARN,Value=$ECS_TASK_ARN Key=FAULT_INJECTION_SIDECAR,Value=true \ --region $ECS_TASK_REGION) ACTIVATION_CODE=$(echo $CREATE_ACTIVATION_OUTPUT | jq -e -r .ActivationCode) ACTIVATION_ID=$(echo $CREATE_ACTIVATION_OUTPUT | jq -e -r .ActivationId) # Register with AWS Systems Manager (SSM) if ! amazon-ssm-agent -register -code $ACTIVATION_CODE -id $ACTIVATION_ID -region $ECS_TASK_REGION; then echo "Failed to register with AWS Systems Manager (SSM), exiting" 1>&2 exit 1 fi # the agent needs to run in the background, otherwise the trapped signal # won't execute the attached function until this process finishes amazon-ssm-agent & SSM_AGENT_PID=$! # need to keep the script alive, otherwise the container will terminate wait $SSM_AGENT_PID else echo "ECS Container Metadata not found, exiting" 1>&2 exit 1 fi else echo "SSM agent is already running, exiting" 1>&2 exit 1 fi

实验模板示例

以下是 aws:ecs:task-cpu-stress 操作的实验模板示例。

{ "description": "Run CPU stress on the target ECS tasks", "targets": { "myTasks": { "resourceType": "aws:ecs:task", "resourceArns": [ "arn:aws:ecs:us-east-1:111122223333:task/my-cluster/09821742c0e24250b187dfed8EXAMPLE" ], "selectionMode": "ALL" } }, "actions": { "EcsTask-cpu-stress": { "actionId": "aws:ecs:task-cpu-stress", "parameters": { "duration": "PT1M" }, "targets": { "Tasks": "myTasks" } } }, "stopConditions": [ { "source": "none", } ], "roleArn": "arn:aws:iam::111122223333:role/fis-experiment-role", "tags": {} }