How monitoring works - AMS Accelerate User Guide

How monitoring works

See the following graphics on monitoring architecture in AWS Managed Services (AMS).

The following diagram depicts the AMS Accelerate monitoring architecture.


      AMS monitoring architecture.

After your resources are tagged based on the policy defined using Resource tagger, and alarm definitions are deployed, the following diagram depicts the AMS monitoring architecture.

  • Generation: At the time of account onboarding, AMS configures baseline monitoring (a combination of CloudWatch (CW) alarms, and CW event rules) for all your resources created in a managed account. The baseline monitoring configuration generates an alert when a CW alarm is triggered or a CW event is generated.

  • Aggregation: All alerts generated by your resources are sent to the AMS monitoring system by directing them to an SNS topic in the account.

  • Processing: AMS analyzes the alerts and processes them based on their potential for impact. Alerts are processed as described next.

    • Alerts with known customer impact: These lead to the creation of a new incident report and AMS follows the incident management process.

      Example alert: An Amazon EC2 instance fails a system health check, AMS attempts to recover the instance by stopping and restarting it.

    • Alerts with uncertain customer impact: For these types of alerts, AMS sends an incident report, in many cases asking you to verify the impact before we can take action.

      For example: An alert for >85% CPU utilization for more than 10 minutes on an Amazon EC2 instance can't be immediately categorized as an incident since this behavior may be expected based on usage. For such alerts, AMS sends an alert notification with the details and checks if the alert needs mitigating action. Alert notifications are discussed in detail in this section. We offer options for mitigating actions in the notification, and your reply that confirms that the alert is an incident triggers the creation of a new incident report and the AMS incident management process. Any service notification that receives a response of "no customer impact," or no response at all for three days, is marked as resolved and the corresponding alert is marked as resolved.

    • Alerts with no customer impact: If, after evaluation, AMS determines that the alert doesn't have customer impact, the alert is closed.

      For example, AWS Health notifies of an EC2 instance requiring replacement but that instance has since been terminated.

Alert notification

As a part of the alert processing, based on the impact analysis, AWS Managed Services (AMS) creates an incident and initiates the incident management process for remediation, when impact can be determined. If impact can't be determined, AMS sends an alert notification to the email address associated with your account by way of a service notification; see the diagram on AMS monitoring architecture for alert handling process in How monitoring works.

Tag-based alert notification

We recommend tag-based alert notifications because notifications sent to a single email address can cause confusion when multiple teams use the same account. You can use tags to get alert notifications for different resources sent to different email addresses. For resources with alerts that need to be sent to a specific email address, tag that resource with the key = OwnerTeamEmail, value = EMAIL_ADDRESS (use a group email; do not put personal information in tags). You can also use a custom tag key, but you must provide the custom tag key name to your CSDM with your explicit consent to use it in an email in order to activate automated notification for the tag-based communication. We recommend using the same tagging strategy for contact tags across all your instances and resources.

Note

The tag key value OwnerTeamEmail does not have to be in Pascal case; however, tags are case sensitive and it's best to use the recommended format. The email address must be specified in full, with the "at sign" (@) to separate the local part from the domain. Examples of invalid email addresses: Team.AppATabc.xyz or john.doe. For general guidance on your tagging strategy, see Tagging AWS resources. Do not add personally identifiable information (PII) in your tags, use distribution lists or aliases wherever possible.