Runtime coverage and troubleshooting for Amazon ECS clusters - Amazon GuardDuty

Runtime coverage and troubleshooting for Amazon ECS clusters

The runtime coverage for Amazon ECS clusters includes the tasks running on AWS Fargate and Amazon ECS container instances1.

For an Amazon ECS cluster that runs on Fargate, the runtime coverage is assessed at the task level. The ECS clusters runtime coverage includes those Fargate tasks that have started running after you have enabled Runtime Monitoring and automated agent configuration for Fargate (ECS only). By default, a Fargate task is immutable. GuardDuty will not be able to install the security agent to monitor containers on already running tasks. To include such a Fargate task, you must stop and start the task again. Make sure to check if the associated service is supported.

For information about Amazon ECS container, see Capacity creation.

Reviewing coverage statistics

The coverage statistics for the Amazon ECS resources associated with your own account or your member accounts is the percentage of the healthy Amazon ECS clusters over all the Amazon ECS clusters in the selected AWS Region. This includes the coverage for Amazon ECS clusters associated with both Fargate and Amazon EC2 instances. The following equation represents this as:

(Healthy clusters/All clusters)*100

Considerations

  • The coverage statistics for the ECS cluster include the coverage status of the Fargate tasks or ECS container instances associated with that ECS cluster. The coverage status of the Fargate tasks include tasks that either are in running state or have recently finished running.

  • In the ECS clusters runtime coverage tab, the Container instances covered field indicates the coverage status of the container instances associated with your Amazon ECS cluster.

    If your Amazon ECS cluster contains only Fargate tasks, the count appears as 0/0.

  • If your Amazon ECS cluster is associated with an Amazon EC2 instance that doesn't have a security agent, the Amazon ECS cluster will also have an Unhealthy coverage status.

    To identify and troubleshoot the coverage issue for the associated Amazon EC2 instance, see Troubleshooting Amazon EC2 runtime coverage issues for Amazon EC2 instances.

Choose one of the access methods to review the coverage statistics for your accounts.

Console
  • Sign in to the AWS Management Console and open the GuardDuty console at https://console.aws.amazon.com/guardduty/.

  • In the navigation pane, choose Runtime Monitoring.

  • Choose the Runtime coverage tab.

  • Under the ECS clusters runtime coverage tab, you can view the coverage statistics aggregated by the coverage status of each Amazon ECS cluster that is available in the Clusters list table.

    • You can filter the Cluster list table by the following columns:

      • Account ID

      • Cluster Name

      • Agent management type

      • Coverage status

  • If any of your Amazon ECS clusters have the Coverage status as Unhealthy, the Issue column includes additional information about the reason for the Unhealthy status.

    If you Amazon ECS clusters are associated with an Amazon EC2 instance, navigate to the EC2 instance runtime coverage tab and filter by the Cluster name field to view the associated Issue.

API/CLI
  • Run the ListCoverage API with your own valid detector ID, current Region, and service endpoint. You can filter and sort the instance list using this API.

    • You can change the example filter-criteria with one of the following options for CriterionKey:

      • ACCOUNT_ID

      • ECS_CLUSTER_NAME

      • COVERAGE_STATUS

      • MANAGEMENT_TYPE

    • You can change the example AttributeName in sort-criteria with the following options:

      • ACCOUNT_ID

      • COVERAGE_STATUS

      • ISSUE

      • ECS_CLUSTER_NAME

      • UPDATED_AT

        The field gets updated only when either a new task gets created in the associated Amazon ECS cluster or there is change in the corresponding coverage status.

    • You can change the max-results (up to 50).

    • To find the detectorId for your account and current Region, see the Settings page in the https://console.aws.amazon.com/guardduty/ console, or run the ListDetectors API.

    aws guardduty --region us-east-1 list-coverage --detector-id 12abc34d567e8fa901bc2d34e56789f0 --sort-criteria '{"AttributeName": "ECS_CLUSTER_NAME", "OrderBy": "DESC"}' --filter-criteria '{"FilterCriterion":[{"CriterionKey":"ACCOUNT_ID", "FilterCondition":{"EqualsValue":"111122223333"}}] }' --max-results 5
  • Run the GetCoverageStatistics API to retrieve coverage aggregated statistics based on the statisticsType.

    • You can change the example statisticsType to one of the following options:

      • COUNT_BY_COVERAGE_STATUS – Represents coverage statistics for ECS clusters aggregated by coverage status.

      • COUNT_BY_RESOURCE_TYPE – Coverage statistics aggregated based on the type of AWS resource in the list.

      • You can change the example filter-criteria in the command. You can use the following options for CriterionKey:

        • ACCOUNT_ID

        • ECS_CLUSTER_NAME

        • COVERAGE_STATUS

        • MANAGEMENT_TYPE

        • INSTANCE_ID

    • To find the detectorId for your account and current Region, see the Settings page in the https://console.aws.amazon.com/guardduty/ console, or run the ListDetectors API.

    aws guardduty --region us-east-1 get-coverage-statistics --detector-id 12abc34d567e8fa901bc2d34e56789f0 --statistics-type COUNT_BY_COVERAGE_STATUS --filter-criteria '{"FilterCriterion":[{"CriterionKey":"ACCOUNT_ID", "FilterCondition":{"EqualsValue":"123456789012"}}] }'

For more information about coverage issues, see Troubleshooting Amazon ECS-Fargate runtime coverage issues.

Coverage status change with EventBridge notifications

The coverage status of your Amazon ECS cluster might appear as Unhealthy. To know when the coverage status changes, we recommend you to monitor the coverage status periodically, and troubleshoot if the status becomes Unhealthy. Alternatively, you can create an Amazon EventBridge rule to receive a notification when the coverage status changes from either Unhealthy to Healthy or otherwise. By default, GuardDuty publishes this in the EventBridge bus for your account.

Sample notification schema

In an EventBridge rule, you can use the pre-defined sample events and event patterns to receive coverage status notification. For more information about creating an EventBridge rule, see Create rule in the Amazon EventBridge User Guide.

Additionally, you can create a custom event pattern by using the following example notification schema. Make sure to replace the values for your account. To get notified when the coverage status of your Amazon ECS cluster changes from Healthy to Unhealthy, the detail-type should be GuardDuty Runtime Protection Unhealthy. To get notified when the coverage status changes from Unhealthy to Healthy, replace the value of detail-type with GuardDuty Runtime Protection Healthy.

{ "version": "0", "id": "event ID", "detail-type": "GuardDuty Runtime Protection Unhealthy", "source": "aws.guardduty", "account": "AWS account ID", "time": "event timestamp (string)", "region": "AWS Region", "resources": [ ], "detail": { "schemaVersion": "1.0", "resourceAccountId": "string", "currentStatus": "string", "previousStatus": "string", "resourceDetails": { "resourceType": "ECS", "ecsClusterDetails": { "clusterName":"", "fargateDetails":{ "issues":[], "managementType":"" }, "containerInstanceDetails":{ "coveredContainerInstances":int, "compatibleContainerInstances":int } } }, "issue": "string", "lastUpdatedAt": "timestamp" } }

Troubleshooting Amazon ECS-Fargate runtime coverage issues

If the coverage status of your Amazon ECS cluster is Unhealthy, you can view the reason under the Issue column.

The following table provides the recommended troubleshooting steps for Fargate (Amazon ECS only) issues. For information about Amazon EC2 instance coverage issues, see Troubleshooting Amazon EC2 runtime coverage issues for Amazon EC2 instances.

Issue type Extra information Recommended troubleshooting steps

Agent not reporting

Agent not reporting for tasks in TaskDefinition - 'TASK_DEFINITION'

Validate that the VPC endpoint for your Amazon ECS cluster's task is correctly configured. For more information, see Validating VPC endpoint configuration.

If your organization has a service control policy (SCP), validate that permissions boundary is not restricting the guardduty:SendSecurityTelemetry permission. For more information, see Validating your organization service control policy.

VPC_ISSUE; for task in TaskDefinition - 'TASK_DEFINITION'

View the VPC issue details in the extra information.

Agent exited

ExitCode: EXIT_CODE for tasks in TaskDefinition - 'TASK_DEFINITION'

View the issue details in the extra information.

Reason: REASON for tasks in TaskDefinition - 'TASK_DEFINITION'

ExitCode: EXIT_CODE with reason: 'EXIT_CODE' for tasks in TaskDefinition - 'TASK_DEFINITION'

Agent exited: Reason: CannotPullContainerError: pull image manifest has been retried...

The task execution role must have the following Amazon Elastic Container Registry (Amazon ECR) permissions:

... "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", ...

For more information, see Provide ECR permissions and subnet details.

After you add the Amazon ECR permissions, you must restart the task.

If the issue persists, see My AWS Step Functions workflow is failing unexpectedly.

VPC Endpoint Creation Failed

Enabling private DNS requires both enableDnsSupport and enableDnsHostnames VPC attributes set to true for vpcId (Service: ECS, Status Code:400, Request ID: a1b2c3d4-5678-90ab-cdef-EXAMPLE11111).

Ensure that the following VPC attributes are set to trueenableDnsSupport and enableDnsHostnames. For more information, see DNS attributes in your VPC.

If you're using Amazon VPC Console at https://console.aws.amazon.com/vpc/ to create the Amazon VPC, make sure to select both Enable DNS hostnames and Enable DNS resolution. For more information, see VPC configuration options.

Agent not provisioned

Unsupported invocation by SERVICE for task(s) in TaskDefinition - 'TASK_DEFINITION'

This task was invoked by a SERVICE that is not supported.

Unsupported CPU architecture 'TYPE' for task(s) in TaskDefinition - 'TASK_DEFINITION'

This task is running on an unsupported CPU architecture. For information about supported CPU architectures, see Validating architectural requirements.

TaskExecutionRole missing from TaskDefinition - 'TASK_DEFINITION'

The ECS task execution role is missing. For information about providing task execution role and required permissions, see Provide ECR permissions and subnet details.

Missing network configuration 'CONFIGURATION_DETAILS' for task(s) in TaskDefinition - 'TASK_DEFINITION'

Network configuration issues may show up because of missing VPC configuration, or missing or empty subnets.

Validate that your network configuration is correct. For more information, see Provide ECR permissions and subnet details.

For more information, see Amazon ECS task definition parameters in the Amazon Elastic Container Service Developer Guide.

Others

Unidentified issue, for tasks in TaskDefinition - 'TASK_DEFINITION'

Use the following questions to identify the root cause of the issue:

  • Did the task start before you enabled Runtime Monitoring?

    In Amazon ECS, the tasks are immutable. To assess the runtime behavior of a running Fargate task, make sure that Runtime Monitoring is already enabled, and then restart the task for GuardDuty to add the container sidecar.

  • Is this task part of a service deployment that started before you enabled Runtime Monitoring?

    If yes, you can either restart the service or update the service with forceNewDeployment by using the steps in Updating a service.

    You can also use UpdateService or AWS CLI.

  • Did the task launch after excluding the ECS cluster from Runtime Monitoring?

    When you change the pre-defined GuardDuty tag from GuardDutyManaged-true to GuardDutyManaged-false, GuardDuty will not receive the runtime events for the ECS cluster.

  • Does your service contain a task that has an old format of taskArn?

    GuardDuty Runtime Monitoring doesn't support the coverage for tasks that have the old format of taskArn.

    For information about Amazon Resource Names (ARNs) for Amazon ECS resources, see Amazon Resource Names (ARNs) and IDs.