Tag-based Alarm Manager - AMS Accelerate Operations Plan

Tag-based Alarm Manager

AMS Accelerate applies alarms to your AWS resources using the tag-based Alarm Manager to implement a baseline monitoring strategy and ensure that all your AWS resources are monitored and protected. By integrating with the tag-based Alarm Manager, you can customize the configuration of your AWS resources based on their type, platform, and other tags, to ensure the resources are monitored. Alarm Manager is deployed to your account during onboarding.

How Alarm Manager works

When your account is onboarded to AMS Accelerate, two JSON documents, called configuration profiles, are deployed in your account in AWS AppConfig. Both profile documents reside in the Alarm Manager application and in the AMS Accelerate infrastructure environment.

The two configuration profiles are named AMSManagedAlarms (the default configuration profile) and CustomerManagedAlarms (the customization configuration profile).

  • Default configuration profile:

    • The configuration found in this profile contains the default configuration that AMS Accelerate deploys in all customer accounts. This configuration contains the default AMS Accelerate monitoring policy, which you should not modify because AMS Accelerate can update this profile at any time, erasing any changes you have made.

    • If you want to modify or disable any of these definitions, see Modifying the default configuration and Disabling the default configuration.

  • Customization configuration profile:

    • Any configuration in this profile is entirely managed by you; AMS Accelerate does not overwrite this profile, unless you explicitly request it.

    • You can specify any custom alarm definitions you want in this profile, and you can also specify modifications to the AMS Accelerate-managed default configuration. For more information, see Modifying the default configuration and Disabling the default configuration.

    • If you update this profile, Alarm Manager automatically enforces your changes across all relevant resources in your AWS account. Note that while your changes are enacted automatically, they may take up to 60 minutes to take effect.

    • You can update this profile using the AWS Management Console or AWS CLI/SDK tools. See the AWS AppConfig User Guide for instructions about updating a configuration.

    • The customization profile is initially empty; however, any alarm definitions placed in the profile document are enforced, in addition to the default configuration.

All CloudWatch alarms created by the Alarm Manager contain the tag key ams:alarm-manager:managed and tag value true. This is to ensure that the Alarm Manager manages only those alarms that it creates, and won’t interfere with any of your own alarms. You can see these tags using the Amazon CloudWatch ListTagsForResource API.

Important

If custom alarm definitions and default alarm definitions are specified with the same ConfigurationID (see Configuration profile document format for monitoring), the custom definitions take priority over default rules.

Getting started with Alarm Manager

By default, when you onboard with AMS Accelerate, your configuration is deployed to AWS AppConfig, defining an alarm baseline for your resources. The alarm definitions are applied only to resources with the ams:rt:* tags. We recommend that these tags be applied using the Resource Tagger: you set up a basic Resource Tagger configuration in order to let AMS Accelerate know which resources you want managed.

Use Resource Tagger to apply the tag key ams:rt:ams-managed with tag value true to any resources you want AMS Accelerate to monitor.

The following is an example Resource Tagger customization profile that you can use to opt in to monitoring for all of your Amazon EC2 instances. For general information, see Resource Tagger.

{ "AWS::EC2::Instance": { "AMSManageAllEC2Instances": { "Enabled": true, "Filter": { "InstanceId": "*" }, "Tags": [ { "Key": "ams:rt:ams-managed", "Value": "true" } ] } } }

For information about how to apply this Resource Tagger configuration, see Viewing or making changes to the Resource Tagger configuration.

Configuration profile document format for monitoring

Both the default configuration profile document and the customization configuration profile document follow the same structure:

{ "<ResourceType>": { "<ConfigurationID>": { "Enabled": true, "Tag": { "Key": "...", "Value": "..." }, "AlarmDefinition": { ... } }, "<ConfigurationID>": { ... } }, "<ResourceType>": { ... } }
  • ResourceType: This key must be one of the following supported strings. The configuration within this JSON object will relate only to the specified AWS resource type. Supported resource types:

    • AWS::EC2::Instance

    • AWS::EC2::Instance::Disk

  • ConfigurationID: This key must be unique in the profile, and uniquely names the following block of configuration. If two configuration blocks in the same ResourceType block have the same ConfigurationID, the one that appears latest in the profile takes effect. If you specify a ConfigurationID in your customization profile that is the same as one specified in the default profile, the configuration block defined in the customization profile takes effect.

    • Enabled: (optional, default=true) Specifies if the configuration block will take effect. Set this to false to disable a configuration block. A disabled configuration block behaves as if it's not present in the profile.

    • Tag: Specifies the tag that this alarm definition applies to. Any resource (of the appropriate resource type) that has this tag key and value will have a CloudWatch alarm created with the given definition. This field is a JSON object with the following fields:

      • Key: The key of the tag to match. Keep in mind that if you're using Resource Tagger to apply the tags to the resource, the key for the tag will always begin with ams:rt:.

      • Value: The value of the tag to match.

    • AlarmDefinition: Defines the alarm to be created. Alarm Manager currently only supports single-metric alarms. This is a JSON object whose fields are passed as is to the CloudWatch PutMetricAlarm API call (with the exception of pseudoparameters; for more information, see Configuration profile - pseudoparameter substitution). For information about what fields are required, see the PutMetricAlarm documentation.

      OR

      CompositeAlarmDefinition: Defines a composite alarm to be created. When you create a composite alarm, you specify a rule expression for the alarm that takes into account the alarm state of other alarms that you have created. This is a JSON object whose fields are passed as-is to the CloudWatchPutCompositeAlarm. The composite alarm goes into ALARM state only if all conditions of the rule are met. The alarms specified in a composite alarm's rule expression can include metric alarms and other composite alarms. For information about what fields are required, see the PutCompositeAlarm documentation.

      Both options provide the following fields:

      • AlarmName: Specifies the name of the alarm you want to create for the resource. This field has all of the same rules as specified in the PutMetricAlarm documentation; however, since the alarm name must be unique in a Region, the Alarm Manager has one additional requirement: you must specify the unique identifier pseudoparameter in the name of the alarm (otherwise, Alarm Manager appends the unique identifier of the resource to the front of the alarm name). For example, for the AWS::EC2::Instance resource type, you must specify ${EC2::InstanceId} in the alarm name, or it's implicitly added at the start of the alarm name. For the list of identifiers, see Configuration profile - pseudoparameter substitution.

        All other fields are as specified in the PutMetricAlarm or the PutCompositeAlarm documentation.

      • AlarmRule: Specifies which other alarms are to be evaluated to determine this composite alarm's state. For each alarm that you reference, they have to be either exist in CloudWatch or specified in Alarm Manager configuration profile in your account.

Important

You can specify either AlarmDefinition or CompositeAlarmDefinition in your Alarm Manager configuration document, But they both can’t be used at the same time.

In the following example, the system creates an alarm when two specified metric alarms exceeds its threshold:

{ "AWS::EC2::Instance": { "LinuxResourceAlarm": { "Enabled": true, "Tag": { "Key": "ams:rt:mylinuxinstance", "Value": "true" }, "CompositeAlarmDefinition": { "AlarmName": "${EC2::InstanceId} Resource Usage High", "AlarmDescription": "Alarm when a linux EC2 instance is using too much CPU and too much Disk", "AlarmRule": "ALARM(\"${EC2::InstanceId}: Disk Usage Too High - ${EC2::Disk::UUID}\") AND ALARM(\"${EC2::InstanceId}: CPU Too High\")" } } } }
Important

When Alarm Manager is not able to create or delete an alarm due to broke configuration, it sends the notification to the Direct-Customer-Alerts SNS topic. This alarm is called AlarmDependencyError.

We highly recommend that you have confirmed your subscription to this SNS topic. To receive messages published to a topic, you must subscribe an endpoint to the topic. For details, see Step 1: Create a topic.

Note

Many of the AMS Accelerate-provided baseline alarm definitions list the SNS topic, MMS-Topic, as a target. This is for use in the AMS Accelerate monitoring service, and is the transport mechanism for your alarm notifications to get to AMS Accelerate. Do not specify MMS-Topic as the target for any alarms other than those provided in the baseline (and overrides of the same), as the service ignores unknown alarms. It does not result in AMS Accelerate acting on your custom alarms.

Configuration profile - pseudoparameter substitution

In either of the configuration profiles, you can specify pseudoparameters that are substituted in place as follows:

  • Global - anywhere in the profile:

    • ${AWS::AccountId}: Replaced with your AWS account ID

    • ${AWS::Partition}: Replaced with the partition of the AWS Region the resource is in (this is 'aws' for most Regions); for more information, see the entry for partition in the ARN reference.

    • ${AWS::Region}: Replaced with the Region name of the Region that your resource is deployed to (for example us-east-1)

  • In an AWS::EC2::Instance resource type block:

    • ${EC2::InstanceId}: (identifier) replaced by the instance ID of your Amazon EC2 instance.

  • In an AWS::EC2::Instance::Disk resource type block:

    • ${EC2::InstanceId}: (identifier) Replaced by the instance ID of your Amazon EC2 instance.

    • ${EC2::Disk::Device}: Replaced by the name of the disk. (Linux only, on instances managed by the CloudWatch Agent).

    • ${EC2::Disk::FSType}: Replaced by the file system type of the disk. (Linux only, on instances managed by the CloudWatch Agent).

    • ${EC2::Disk::Path}: Replaced by the disk path. On Linux, this is the mount point of the disk (for example, /), while in Windows this is the drive label (for example, c:/ ) (only on instance managed by the CloudWatch Agent).

    • ${EC2::Disk::UUID}: Replaced by a generated UUID that uniquely identifies the disk, this must be specified in the name of the alarm, as an alarm under AWS::EC2::Instance::Disk resource type will create one alarm per volume. Specifying ${EC2::Disk::UUID} will maintain uniqueness of alarm names.

Note

All parameters marked with identifier are used as a prefix for the name of created alarms, unless you specify that identifier in the alarm name.

Configurations example

In the following example, the system creates an alarm for each disk attached to the matching Linux instance.

{ "AWS::EC2::Instance::Disk": { "LinuxDiskAlarm": { "Tag": { "Key": "ams:rt:mylinuxinstance", "Value": "true" }, "AlarmDefinition": { "MetricName": "disk_used_percent", "Namespace": "CWAgent", "Dimensions": [ { "Name": "InstanceId", "Value": "${EC2::InstanceId}" }, { "Name": "device", "Value": "${EC2::Disk::Device}" }, { "Name": "fstype", "Value": "${EC2::Disk::FSType}" }, { "Name": "path", "Value": "${EC2::Disk::Path}" } ], "AlarmName": "${EC2::InstanceId}: Disk Usage Too High - ${EC2::Disk::UUID}" ... } } } }

In the following example, the system creates an alarm for each disk attached to the matching Windows instance.

{ "AWS::EC2::Instance::Disk": { "WindowsDiskAlarm": { "Tag": { "Key": "ams:rt:mywindowsinstance", "Value": "true" }, "AlarmDefinition": { "MetricName": "LogicalDisk % Free Space", "Namespace": "CWAgent", "Dimensions": [ { "Name": "InstanceId", "Value": "${EC2::InstanceId}" }, { "Name": "objectname", "Value": "LogicalDisk" }, { "Name": "instance", "Value": "${EC2::Disk::Path}" } ], "AlarmName": "${EC2::InstanceId}: Disk Usage Too High - ${EC2::Disk::UUID}" ... } } } }

Viewing Alarm Manager configuration

Both the AMSManagedAlarms and CustomerManagedAlarms can be reviewed in AppConfig with GetConfiguration.

The following is an example of the GetConfiguration call:

aws appconfig get-configuration --application AMSAlarmManager --environment AMSInfrastructure --configuration AMSManagedAlarms --client-id any-string outfile.json
  • Application: this is AppConfig's logical unit to provide capabilities; for the Alarm Manager, this is AMSAlarmManager 

  • Environment: this is the AMSInfrastructure environment

  • Configuration: to view AMS Accelerate baseline alarms, the value is AMSManagedAlarms; to view customer alarm definitions, the configuration is CustomerManagedAlarms

  • Client ID: this is a unique application instance identifier, which can be any string

  • The alarm definitions can be viewed in the specified output file, which in this case is outfile.json

You can see which version of configuration is deployed to your account by viewing the past deployments in the AMSInfrastructure environment.

Changing the configuration

To add or update new alarm definitions, invoke the CreateHostedConfigurationVersion API.

This is a Linux command line command that generates the parameter value in base64, which is what the AppConfig CLI command expects. For information, see the AWS CLI documentation, Binary/Blob (binary large object).

As an example:

aws appconfig create-hosted-configuration-version --application-id application-id --configuration-profile-id configuration-profile-id --content base64-string --content-type application/json
  • Application ID: ID of the application AMSAlarmManager; you can find this out with the ListApplications API call.

  • Configuration Profile ID: ID of the configuration CustomerManagedAlarms; you can find this out with the ListConfigurationProfiles API call.

  • Content: Base64 string of the content, to be created by creating a document and encoding it in base64: cat alarms-v2.json | base64 (see Binary/Blob (binary large object)).

    Content Type: MIME type, application/json because alarm definitions are written in JSON.

Important

Restrict access to the StartDeployment and StopDeployment API actions to trusted users who understand the responsibilities and consequences of deploying a new configuration to your targets.

To learn more about how to use AWS AppConfig features to create and deploy a configuration, see Working with AWS AppConfig.

Modifying the default configuration

While you can't modify the default configuration profile, you can provide overrides to the defaults by specifying a configuration block in your customization profile with the same ConfigurationID as the default configuration block. If you do this, your whole configuration block overwrites the default configuration block for which tagging configuration to apply.

For example, consider the following default configuration profile:

{ "AWS::EC2::Instance": { "AMSManagedBlock1": { "Enabled": true, "Tag": { "Key": "ams:rt:ams-monitoring-policy", "Value": "ams-monitored" }, "AlarmDefinition": { "AlarmName": "${EC2::InstanceId}: AMS Default Alarm", "Namespace": "AWS/EC2", "MetricName": "CPUUtilization", "Dimensions": [ { "Name": "InstanceId", "Value": "${EC2::InstanceId}" } ], "Threshold": 5, ... } } } }

In order to change the threshold of this alarm to 10, you must provide the entire alarm definition, not only the parts you want to change. For example, you might provide the following customization profile:

{ "AWS::EC2::Instance": { "AMSManagedBlock1": { "Enabled": true, "Tag": { "Key": "ams:rt:ams-monitoring-policy", "Value": "ams-monitored" }, "AlarmDefinition": { "AlarmName": "${EC2::InstanceId}: AMS Default Alarm", "Namespace": "AWS/EC2", "MetricName": "CPUUtilization", "Dimensions": [ { "Name": "InstanceId", "Value": "${EC2::InstanceId}" } ], "Threshold": 10, ... } } } }
Important

Remember to deploy your configuration changes after you have made them. In SSM AppConfig, you must deploy a new version of the configuration after creating it.

Deploying configuration changes

Once the customization is completed, these changes must be deployed through StartDeployment.

aws appconfig start-deployment --application-id application_id --environment-id environment_id Vdeployment-strategy-id deployment_strategy_id --configuration-profile-id configuration_profile_id --configuration-version 1
  • Application ID: ID of the application AMSAlarmManager, you can find this with the ListApplications API call.

  • Environment ID: You can find this with the ListEnvironments API call.

  • Deployment Strategy ID: You can find this with the ListDeploymentStrategies API call.

  • Configuration Profile ID: ID of CustomerManagedAlarms; you can find this with the ListConfigurationProfiles API call.

  • Configuration Version: The version of the configuration profile to be deployed.

Important

Alarm Manager applies the alarm definitions as specified in the configuration profiles. Any manual modifications you make with the AWS Management Console or CloudWatch CLI/SDK to the CloudWatch alarms is automatically reverted back, so make sure your changes are defined through Alarm Manager. To understand which alarms are created by the Alarm Manager, you can look for the ams:alarm-manager:managed tag with value true.

Restrict access to the StartDeployment and StopDeployment API actions to trusted users who understand the responsibilities and consequences of deploying a new configuration to your targets.

To learn more about how to use AWS AppConfig features to create and deploy a configuration, see the documentation.

Rolling back changes

You can roll back alarm definitions through the same deployment mechanism by specifying a previous configuration profile version and running StartDeployment.

Disabling the default configuration

AMS Accelerate provides the default configuration profile in your account based on the baseline alarms. However, this default configuration can be disabled by overriding any of the alarm definitions. You can disable a default configuration rule by overriding the ConfigurationID of the rule in your customization configuration profile and specifying the enabled field with a value of false.

For example, if the following configuration was present in the default configuration profile:

{ "AWS::EC2::Instance": { "AMSManagedBlock1": { "Enabled": true, "Tag": { "Key": "ams:rt:ams-monitoring-policy", "Value": "ams-monitored" }, "AlarmDefinition": { ... } } }

You could disable this tagging rule by including the following in your customization configuration profile:

{ "AWS::EC2::Instance": { "AMSManagedBlock1": { "Enabled": false } } }

To make these changes, the CreateHostedConfigurationVersion API must be called with the JSON profile document (see Changing the configuration) and subsequently must be deployed (see Deploying configuration changes). Note that when you create the new configuration version, you must also include any previously created custom alarms that you want in the JSON profile document.

Important

When AMS Accelerate updates the default configuration profile, it's not calibrated against your configured custom alarms, so review changes to the default alarms when you're overriding them in your customization configuration profile.

Creating additional CloudWatch alarms

You can create additional CloudWatch alarms for AMS Accelerate using custom CloudWatch metrics and alarms for Amazon EC2 instances.

Produce your application monitoring script and custom metric. For more information and access to example scripts, see Monitoring Memory and Disk Metrics for Amazon EC2 Linux Instances.

The CloudWatch monitoring scripts for Linux Amazon EC2 instances demonstrate how to produce and consume custom CloudWatch metrics. These sample Perl scripts comprise a fully functional example that reports memory, swap, and disk space utilization metrics for a Linux instance.

Important

AMS Accelerate does not monitor CloudWatch alarms created by you.