Step 2: Running and managing AWS Resilience Hub resiliency assessments - AWS Resilience Hub

Step 2: Running and managing AWS Resilience Hub resiliency assessments

After you publish a new version of your application, you must run a new resiliency assessment and analyze the results to ensure that your application meets the estimated workload RTO and estimated RPO that are defined in your resiliency policy. The assessment compares each Application Component configuration to the policy and makes alarm, SOP, and test recommendations.

For more information, see the following topics:

Running and monitoring AWS Resilience Hub resiliency assessments

To run resiliency assessments in AWS Resilience Hub and monitor their status, you must use the following APIs:

The following example shows how to start running a new assessment in AWS Resilience Hub using StartAppAssessment API.

Request

aws resiliencehub start-app-assessment \ --app-arn <App_ARN> \ --app-version release \ --assessment-name first-assessment

Response

{ "assessment": { "appArn": "<App_ARN>", "appVersion": "release", "invoker": "User", "assessmentStatus": "Pending", "startTime": "2022-10-27T08:15:10.452000+03:00", "assessmentName": "first-assessment", "assessmentArn": "<Assessment_ARN>", "policy": { "policyArn": "<Policy_ARN>", "policyName": "newPolicy", "dataLocationConstraint": "AnyLocation", "policy": { "AZ": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Hardware": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Software": { "rtoInSecs": 172800, "rpoInSecs": 86400 } } }, "tags": {} } }

The following example shows how to monitor the status of your assessment in AWS Resilience Hub using DescribeAppAssessment API. You can extract the status of your assessment from the assessmentStatus variable.

Request

aws resiliencehub describe-app-assessment \ --assessment-arn <Assessment_ARN>

Response

{ "assessment": { "appArn": "<App_ARN>", "appVersion": "release", "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "resiliencyScore": { "score": 0.27, "disruptionScore": { "AZ": 0.42, "Hardware": 0.0, "Region": 0.0, "Software": 0.38 } }, "compliance": { "AZ": { "achievableRtoInSecs": 0, "currentRtoInSecs": 4500, "currentRpoInSecs": 86400, "complianceStatus": "PolicyMet", "achievableRpoInSecs": 0 }, "Hardware": { "achievableRtoInSecs": 0, "currentRtoInSecs": 2595601, "currentRpoInSecs": 2592001, "complianceStatus": "PolicyBreached", "achievableRpoInSecs": 0 }, "Software": { "achievableRtoInSecs": 0, "currentRtoInSecs": 4500, "currentRpoInSecs": 86400, "complianceStatus": "PolicyMet", "achievableRpoInSecs": 0 } }, "complianceStatus": "PolicyBreached", "assessmentStatus": "Success", "startTime": "2022-10-27T08:15:10.452000+03:00", "endTime": "2022-10-27T08:15:31.883000+03:00", "assessmentName": "first-assessment", "assessmentArn": "<Assessment_ARN>", "policy": { "policyArn": "<Policy_ARN>", "policyName": "newPolicy", "dataLocationConstraint": "AnyLocation", "policy": { "AZ": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Hardware": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Software": { "rtoInSecs": 172800, "rpoInSecs": 86400 } } }, "tags": {} } }

Examining assessment results

After your assessment is completed successfully, you can examine the assessment results using the following APIs.

The following example shows how to obtain the alarm recommendations using the Amazon Resource Name (ARN) of the assessment using ListAlarmRecommendations API.

Note

To obtain the SOP and FIS test recommendations, replace with either ListSopRecommendations or ListTestRecommendations.

Request

aws resiliencehub list-alarm-recommendations \ --assessment-arn <Assessment_ARN>

Response

{ "alarmRecommendations": [ { "recommendationId": "78ece7f8-c776-499e-baa8-b35f5e8b8ba2", "referenceId": "app_common:alarm:synthetic_canary:2021-04-01", "name": "AWSResilienceHub-SyntheticCanaryInRegionAlarm_2021-04-01", "description": "A monitor for the entire application, configured to constantly verify that the application API/endpoints are available", "type": "Metric", "appComponentName": "appcommon", "items": [ { "resourceId": "us-west-2", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "Make sure CloudWatch Synthetics is setup to monitor the application (see the <a href=\"https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html\" target=\"_blank\">docs</a>). \nMake sure that the Synthetics Name passed in the alarm dimension matches the name of the Synthetic Canary. It Defaults to the name of the application.\n" }, { "recommendationId": "d9c72c58-8c00-43f0-ad5d-0c6e5332b84b", "referenceId": "efs:alarm:percent_io_limit:2020-04-01", "name": "AWSResilienceHub-EFSHighIoAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that reports when EFS I/O load is more than 90% for too much time", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "09f340cd-3427-4f66-8923-7f289d4a3216", "referenceId": "efs:alarm:mount_failure:2020-04-01", "name": "AWSResilienceHub-EFSMountFailureAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that reports when volume failed to mount to EC2 instance", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "* Make sure Amazon EFS utils are installed(see the <a href=\"https://github.com/aws/efs-utils#installation\" target=\"_blank\">docs</a>).\n* Make sure cloudwatch logs are enabled in efs-utils (see the <a href=\"https://github.com/aws/efs-utils#step-2-enable-cloudwatch-log-feature-in-efs-utils-config-file-etcamazonefsefs-utilsconf\" target=\"_blank\">docs</a>).\n* Make sure that you've configured `log_group_name` in `/etc/amazon/efs/efs-utils.conf`, for example: `log_group_name = /aws/efs/utils`.\n* Use the created `log_group_name` in the generated alarm. Find `LogGroupName: REPLACE_ME` in the alarm and make sure the `log_group_name` is used instead of REPLACE_ME.\n" }, { "recommendationId": "b0f57d2a-1220-4f40-a585-6dab1e79cee2", "referenceId": "efs:alarm:client_connections:2020-04-01", "name": "AWSResilienceHub-EFSHighClientConnectionsAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that reports when client connection number deviation is over the specified threshold", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "15f49b10-9bac-4494-b376-705f8da252d7", "referenceId": "rds:alarm:health-storage:2020-04-01", "name": "AWSResilienceHub-RDSInstanceLowStorageAlarm_2020-04-01", "description": "Reports when database free storage is low", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "c1906101-cea8-4f77-be7b-60abb07621f5", "referenceId": "rds:alarm:health-connections:2020-04-01", "name": "AWSResilienceHub-RDSInstanceConnectionSpikeAlarm_2020-04-01", "description": "Reports when database connection count is anomalous", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "f169b8d4-45c1-4238-95d1-ecdd8d5153fe", "referenceId": "rds:alarm:health-cpu:2020-04-01", "name": "AWSResilienceHub-RDSInstanceOverUtilizedCpuAlarm_2020-04-01", "description": "Reports when database used CPU is high", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "69da8459-cbe4-4ba1-a476-80c7ebf096f0", "referenceId": "rds:alarm:health-memory:2020-04-01", "name": "AWSResilienceHub-RDSInstanceLowMemoryAlarm_2020-04-01", "description": "Reports when database free memory is low", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "67e7902a-f658-439e-916b-251a57b97c8a", "referenceId": "ecs:alarm:health-service_cpu_utilization:2020-04-01", "name": "AWSResilienceHub-ECSServiceHighCpuUtilizationAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that triggers when CPU utilization of ECS tasks of Service exceeds the threshold", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "fb30cb91-1f09-4abd-bd2e-9e8ee8550eb0", "referenceId": "ecs:alarm:health-service_memory_utilization:2020-04-01", "name": "AWSResilienceHub-ECSServiceHighMemoryUtilizationAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub for Amazon ECS that indicates if the percentage of memory that is used in the service, is exceeding specified threshold limit", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "1bd45a8e-dd58-4a8e-a628-bdbee234efed", "referenceId": "ecs:alarm:health-service_sample_count:2020-04-01", "name": "AWSResilienceHub-ECSServiceSampleCountAlarm_2020-04-01", "description": "Alarm by AWS Resilience Hub for Amazon ECS that triggers if the count of tasks isn't equal Service Desired Count", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "Make sure the Container Insights on Amazon ECS is enabled: (see the <a href=\"https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-ECS-cluster.html\" target=\"_blank\">docs</a>)." } ] }

The following example shows how to obtain the configuration recommendations (recommendations on how to improve your current resiliency) using ListAppComponentRecommendations API.

Request

aws resiliencehub list-app-component-recommendations \ --assessment-arn <Assessment_ARN>

Response

{ "componentRecommendations": [ { "appComponentName": "computeappcomponent-nrz", "recommendationStatus": "MetCanImprove", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "LeastCost", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "LeastChange", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 14.74, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No expected downtime. You're launching using EC2, with DesiredCount > 1 in multiple AZs and CapacityProviders with MinSize > 1", "expectedRpoInSecs": 0, "expectedRpoDescription": "ECS Service state is saved on EFS file system. No data loss is expected as objects are be stored in multiple AZs." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No expected downtime. You're launching using EC2, with DesiredCount > 1 and CapacityProviders with MinSize > 1", "expectedRpoInSecs": 0, "expectedRpoDescription": "ECS Service state is saved on EFS file system. No data loss is expected as objects are be stored in multiple AZs." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "BestAZRecovery", "description": "Stateful ECS service with launch type EC2 and EFS storage, deployed in multiple AZs. AWS Backup is used to backup EFS and copy snapshots in-region.", "suggestedChanges": [ "Add Auto Scaling Groups and Capacity Providers in multiple AZs", "Change desired count of the setup", "Remove EBS volume" ], "haArchitecture": "BackupAndRestore", "referenceId": "ecs:config:ec2-multi_az-efs-backups:2022-02-16" } ] }, { "appComponentName": "databaseappcomponent-hji", "recommendationStatus": "MetCanImprove", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from an RDS backup. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." } }, "optimizationType": "LeastCost", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from an RDS backup. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." } }, "optimizationType": "LeastChange", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 76.73, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 120, "expectedRtoDescription": "Estimated time to promote a secondary instance.", "expectedRpoInSecs": 0, "expectedRpoDescription": "Aurora data is automatically replicated across multiple Availability Zones in a Region." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 120, "expectedRtoDescription": "Estimated time to promote a secondary instance.", "expectedRpoInSecs": 0, "expectedRpoDescription": "Aurora data is automatically replicated across multiple Availability Zones in a Region." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Estimate time to backtrack to a stable state.", "expectedRpoInSecs": 300, "expectedRpoDescription": "Estimate for latest restorable time for point in time recovery." } }, "optimizationType": "BestAZRecovery", "description": "Aurora database cluster with one read replica, with backtracking window of 24 hours.", "suggestedChanges": [ "Add read replica in the same region", "Change DB instance to a supported class (db.t3.small)", "Change to Aurora", "Enable cluster backtracking", "Enable instance backup with retention period 7" ], "haArchitecture": "WarmStandby", "referenceId": "rds:config:aurora-backtracking" } ] }, { "appComponentName": "storageappcomponent-rlb", "recommendationStatus": "BreachedUnattainable", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "storageappcomponent-rlb", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No data loss in your system", "expectedRpoInSecs": 0, "expectedRpoDescription": "No data loss in your system" }, "Hardware": { "expectedComplianceStatus": "PolicyBreached", "expectedRtoInSecs": 2592001, "expectedRtoDescription": "No recovery option configured", "expectedRpoInSecs": 2592001, "expectedRpoDescription": "No recovery option configured" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Time to recover EFS from backup. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Recovery Point Objective for EFS from backups, derived from backup frequency" } }, "optimizationType": "BestAZRecovery", "description": "EFS with backups configured", "suggestedChanges": [ "Add additional availability zone" ], "haArchitecture": "MultiSite", "referenceId": "efs:config:with_backups:2020-04-01" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "storageappcomponent-rlb", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No data loss in your system", "expectedRpoInSecs": 0, "expectedRpoDescription": "No data loss in your system" }, "Hardware": { "expectedComplianceStatus": "PolicyBreached", "expectedRtoInSecs": 2592001, "expectedRtoDescription": "No recovery option configured", "expectedRpoInSecs": 2592001, "expectedRpoDescription": "No recovery option configured" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Time to recover EFS from backup. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Recovery Point Objective for EFS from backups, derived from backup frequency" } }, "optimizationType": "BestAttainable", "description": "EFS with backups configured", "suggestedChanges": [ "Add additional availability zone" ], "haArchitecture": "MultiSite", "referenceId": "efs:config:with_backups:2020-04-01" } ] } ] }