How monitoring works - AMS Accelerate User Guide

How monitoring works

See the following graphics on monitoring architecture in AWS Managed Services (AMS).

The following diagram depicts the AMS Accelerate monitoring architecture.

AMS monitoring architecture.

After your resources are tagged based on the policy defined using Resource tagger, and alarm definitions are deployed, the following diagram depicts the AMS monitoring architecture.

  • Generation: At the time of account onboarding, AMS configures baseline monitoring (a combination of CloudWatch (CW) alarms, and CW event rules) for all your resources created in a managed account. The baseline monitoring configuration generates an alert when a CW alarm is triggered or a CW event is generated.

  • Aggregation: All alerts generated by your resources are sent to the AMS monitoring system by directing them to an SNS topic in the account.

  • Processing: AMS analyzes the alerts and processes them based on their potential for impact. Alerts are processed as described next.

    • Alerts with known customer impact: These lead to the creation of a new incident report and AMS follows the incident management process.

      Example alert: An Amazon EC2 instance fails a system health check, AMS attempts to recover the instance by stopping and restarting it.

    • Alerts with uncertain customer impact: For these types of alerts, AMS sends an incident report, in many cases asking you to verify the impact before AMS takes action. However, if the infrastructure-related checks are passing, then AMS doesn't send an incident report to you.

      For example: An alert for >85% CPU utilization for more than 10 minutes on an Amazon EC2 instance can't immediately be categorized as an incident since this behavior might be expected based on usage. In this example, AMS Automation performs infrastructure-related checks on the resource. If those checks pass, then AMS doesn't send an alert notification, even if CPU usage crossed 99%. If Automation detects that infrastructure-related checks are failing on the resource, then AMS sends an alert notification and checks if mitigation is needed. Alert notifications are discussed in detail in this section. AMS offers mitigation options in the notification. When you reply to the notification confirming that the alert is an incident AMS creates a new incident report and the AMS incident management process begins. Service notifications that receive a response of "no customer impact," or no response at all for three days, is marked as resolved and the corresponding alert is marked as resolved.

    • Alerts with no customer impact: If, after evaluation, AMS determines that the alert doesn't have customer impact, then the alert is closed.

      For example, AWS Health notifies of an EC2 instance requiring replacement but that instance has since been terminated.

Alert notification

As a part of the alert processing, based on the impact analysis, AWS Managed Services (AMS) creates an incident and initiates the incident management process for remediation, when impact can be determined. If impact can't be determined, then AMS sends an alert notification to the email address associated with your account through a service notification. In some scenarios, this alert notification isn't sent. For example, if the infrastructure-related checks are passing for a high CPU utilization alert, then an alert notification isn't sent to you. For more information, see the diagram on AMS monitoring architecture for alert handling process in How monitoring works.

Tag-based alert notification

Use tags to send alert notifications for your resources to different email addresses. It's a best practice to use tag-based alert notifications because notifications sent to a single email address might cause confusion when multiple developer teams use the same account.

Send alerts to a specific email address

Tag resources that have alerts that must be sent to a specific email address with the key = OwnerTeamEmail, value = EMAIL_ADDRESS.

Send alerts to multiple email addresses

To use multiple email addresses, specify a comma-separated list of values. For example, key = OwnerTeamEmail, value = EMAIL_ADDRESS_1, EMAIL_ADDRESS_2, EMAIL_ADDRESS_3, .... The total number of characters for the value field cannot exceed 260.

Use a custom tag key

To use a custom tag key, provide the custom tag key name to your CSDM in an email that explicitly gives consent to activate automated notifications for the tag-based communication. It's a best practice to use the same tagging strategy for contact tags across all your instances and resources.

Note

The key value OwnerTeamEmail doesn't have to be in camel case. However, tags are case sensitive and it's best practice to use the recommended format.

The email address must be specified in full, with the "at sign" (@) to separate the local part from the domain. Examples of invalid email addresses: Team.AppATabc.xyz or john.doe. For general guidance on your tagging strategy, see Tagging AWS resources. Don't add personally identifiable information (PII) in your tags. Use distribution lists or aliases wherever possible.