步驟 2:執行和管理AWS Resilience Hub復原能力評估 - AWS 韌性樞紐

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

步驟 2:執行和管理AWS Resilience Hub復原能力評估

發佈新版應用程式之後,您必須執行新的恢復能力評估並分析結果,以確保您的應用程式符合復原原則中定義的估計工作負載 RTO 和預估的 RPO。評估會將每個應用程式元件組態與原則進行比較,並提出警示、SOP 和測試建議。

如需詳細資訊,請參閱下列主題:

執行和監控AWS Resilience Hub恢復能力評估

若要在中執行恢復能力評估AWS Resilience Hub並監控其狀態,您必須使用下列 API:

下列範例顯示如何在AWS Resilience Hub使用 StartAppAssessment API 中開始執行新評估。

請求

aws resiliencehub start-app-assessment \ --app-arn <App_ARN> \ --app-version release \ --assessment-name first-assessment

回應

{ "assessment": { "appArn": "<App_ARN>", "appVersion": "release", "invoker": "User", "assessmentStatus": "Pending", "startTime": "2022-10-27T08:15:10.452000+03:00", "assessmentName": "first-assessment", "assessmentArn": "<Assessment_ARN>", "policy": { "policyArn": "<Policy_ARN>", "policyName": "newPolicy", "dataLocationConstraint": "AnyLocation", "policy": { "AZ": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Hardware": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Software": { "rtoInSecs": 172800, "rpoInSecs": 86400 } } }, "tags": {} } }

下列範例顯示如何在AWS Resilience Hub使用 DescribeAppAssessment API 監控評估狀態。您可以從assessmentStatus變數擷取評估狀態。

請求

aws resiliencehub describe-app-assessment \ --assessment-arn <Assessment_ARN>

回應

{ "assessment": { "appArn": "<App_ARN>", "appVersion": "release", "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "resiliencyScore": { "score": 0.27, "disruptionScore": { "AZ": 0.42, "Hardware": 0.0, "Region": 0.0, "Software": 0.38 } }, "compliance": { "AZ": { "achievableRtoInSecs": 0, "currentRtoInSecs": 4500, "currentRpoInSecs": 86400, "complianceStatus": "PolicyMet", "achievableRpoInSecs": 0 }, "Hardware": { "achievableRtoInSecs": 0, "currentRtoInSecs": 2595601, "currentRpoInSecs": 2592001, "complianceStatus": "PolicyBreached", "achievableRpoInSecs": 0 }, "Software": { "achievableRtoInSecs": 0, "currentRtoInSecs": 4500, "currentRpoInSecs": 86400, "complianceStatus": "PolicyMet", "achievableRpoInSecs": 0 } }, "complianceStatus": "PolicyBreached", "assessmentStatus": "Success", "startTime": "2022-10-27T08:15:10.452000+03:00", "endTime": "2022-10-27T08:15:31.883000+03:00", "assessmentName": "first-assessment", "assessmentArn": "<Assessment_ARN>", "policy": { "policyArn": "<Policy_ARN>", "policyName": "newPolicy", "dataLocationConstraint": "AnyLocation", "policy": { "AZ": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Hardware": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Software": { "rtoInSecs": 172800, "rpoInSecs": 86400 } } }, "tags": {} } }

檢查評估結果

成功完成評估後,您可以使用以下 API 檢查評估結果。

下列範例顯示如何使用 ListAlarmRecommendations API 使用評估的 Amazon 資源名稱 (ARN) 取得警示建議。

注意

若要取得 SOP 和 FIS 測試建議,請以ListSopRecommendationsListTestRecommendations取代。

請求

aws resiliencehub list-alarm-recommendations \ --assessment-arn <Assessment_ARN>

回應

{ "alarmRecommendations": [ { "recommendationId": "78ece7f8-c776-499e-baa8-b35f5e8b8ba2", "referenceId": "app_common:alarm:synthetic_canary:2021-04-01", "name": "AWSResilienceHub-SyntheticCanaryInRegionAlarm_2021-04-01", "description": "A monitor for the entire application, configured to constantly verify that the application API/endpoints are available", "type": "Metric", "appComponentName": "appcommon", "items": [ { "resourceId": "us-west-2", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "Make sure CloudWatch Synthetics is setup to monitor the application (see the <a href=\"https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html\" target=\"_blank\">docs</a>). \nMake sure that the Synthetics Name passed in the alarm dimension matches the name of the Synthetic Canary. It Defaults to the name of the application.\n" }, { "recommendationId": "d9c72c58-8c00-43f0-ad5d-0c6e5332b84b", "referenceId": "efs:alarm:percent_io_limit:2020-04-01", "name": "AWSResilienceHub-EFSHighIoAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that reports when EFS I/O load is more than 90% for too much time", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "09f340cd-3427-4f66-8923-7f289d4a3216", "referenceId": "efs:alarm:mount_failure:2020-04-01", "name": "AWSResilienceHub-EFSMountFailureAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that reports when volume failed to mount to EC2 instance", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "* Make sure Amazon EFS utils are installed(see the <a href=\"https://github.com/aws/efs-utils#installation\" target=\"_blank\">docs</a>).\n* Make sure cloudwatch logs are enabled in efs-utils (see the <a href=\"https://github.com/aws/efs-utils#step-2-enable-cloudwatch-log-feature-in-efs-utils-config-file-etcamazonefsefs-utilsconf\" target=\"_blank\">docs</a>).\n* Make sure that you've configured `log_group_name` in `/etc/amazon/efs/efs-utils.conf`, for example: `log_group_name = /aws/efs/utils`.\n* Use the created `log_group_name` in the generated alarm. Find `LogGroupName: REPLACE_ME` in the alarm and make sure the `log_group_name` is used instead of REPLACE_ME.\n" }, { "recommendationId": "b0f57d2a-1220-4f40-a585-6dab1e79cee2", "referenceId": "efs:alarm:client_connections:2020-04-01", "name": "AWSResilienceHub-EFSHighClientConnectionsAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that reports when client connection number deviation is over the specified threshold", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "15f49b10-9bac-4494-b376-705f8da252d7", "referenceId": "rds:alarm:health-storage:2020-04-01", "name": "AWSResilienceHub-RDSInstanceLowStorageAlarm_2020-04-01", "description": "Reports when database free storage is low", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "c1906101-cea8-4f77-be7b-60abb07621f5", "referenceId": "rds:alarm:health-connections:2020-04-01", "name": "AWSResilienceHub-RDSInstanceConnectionSpikeAlarm_2020-04-01", "description": "Reports when database connection count is anomalous", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "f169b8d4-45c1-4238-95d1-ecdd8d5153fe", "referenceId": "rds:alarm:health-cpu:2020-04-01", "name": "AWSResilienceHub-RDSInstanceOverUtilizedCpuAlarm_2020-04-01", "description": "Reports when database used CPU is high", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "69da8459-cbe4-4ba1-a476-80c7ebf096f0", "referenceId": "rds:alarm:health-memory:2020-04-01", "name": "AWSResilienceHub-RDSInstanceLowMemoryAlarm_2020-04-01", "description": "Reports when database free memory is low", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "67e7902a-f658-439e-916b-251a57b97c8a", "referenceId": "ecs:alarm:health-service_cpu_utilization:2020-04-01", "name": "AWSResilienceHub-ECSServiceHighCpuUtilizationAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that triggers when CPU utilization of ECS tasks of Service exceeds the threshold", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "fb30cb91-1f09-4abd-bd2e-9e8ee8550eb0", "referenceId": "ecs:alarm:health-service_memory_utilization:2020-04-01", "name": "AWSResilienceHub-ECSServiceHighMemoryUtilizationAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub for Amazon ECS that indicates if the percentage of memory that is used in the service, is exceeding specified threshold limit", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "1bd45a8e-dd58-4a8e-a628-bdbee234efed", "referenceId": "ecs:alarm:health-service_sample_count:2020-04-01", "name": "AWSResilienceHub-ECSServiceSampleCountAlarm_2020-04-01", "description": "Alarm by AWS Resilience Hub for Amazon ECS that triggers if the count of tasks isn't equal Service Desired Count", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "Make sure the Container Insights on Amazon ECS is enabled: (see the <a href=\"https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-ECS-cluster.html\" target=\"_blank\">docs</a>)." } ] }

下列範例顯示如何使用 ListAppComponentRecommendations API 取得組態建議 (有關如何改善目前恢復能力的建議)。

請求

aws resiliencehub list-app-component-recommendations \ --assessment-arn <Assessment_ARN>

回應

{ "componentRecommendations": [ { "appComponentName": "computeappcomponent-nrz", "recommendationStatus": "MetCanImprove", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "LeastCost", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "LeastChange", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 14.74, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No expected downtime. You're launching using EC2, with DesiredCount > 1 in multiple AZs and CapacityProviders with MinSize > 1", "expectedRpoInSecs": 0, "expectedRpoDescription": "ECS Service state is saved on EFS file system. No data loss is expected as objects are be stored in multiple AZs." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No expected downtime. You're launching using EC2, with DesiredCount > 1 and CapacityProviders with MinSize > 1", "expectedRpoInSecs": 0, "expectedRpoDescription": "ECS Service state is saved on EFS file system. No data loss is expected as objects are be stored in multiple AZs." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "BestAZRecovery", "description": "Stateful ECS service with launch type EC2 and EFS storage, deployed in multiple AZs. AWS Backup is used to backup EFS and copy snapshots in-region.", "suggestedChanges": [ "Add Auto Scaling Groups and Capacity Providers in multiple AZs", "Change desired count of the setup", "Remove EBS volume" ], "haArchitecture": "BackupAndRestore", "referenceId": "ecs:config:ec2-multi_az-efs-backups:2022-02-16" } ] }, { "appComponentName": "databaseappcomponent-hji", "recommendationStatus": "MetCanImprove", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from an RDS backup. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." } }, "optimizationType": "LeastCost", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from an RDS backup. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." } }, "optimizationType": "LeastChange", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 76.73, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 120, "expectedRtoDescription": "Estimated time to promote a secondary instance.", "expectedRpoInSecs": 0, "expectedRpoDescription": "Aurora data is automatically replicated across multiple Availability Zones in a Region." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 120, "expectedRtoDescription": "Estimated time to promote a secondary instance.", "expectedRpoInSecs": 0, "expectedRpoDescription": "Aurora data is automatically replicated across multiple Availability Zones in a Region." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Estimate time to backtrack to a stable state.", "expectedRpoInSecs": 300, "expectedRpoDescription": "Estimate for latest restorable time for point in time recovery." } }, "optimizationType": "BestAZRecovery", "description": "Aurora database cluster with one read replica, with backtracking window of 24 hours.", "suggestedChanges": [ "Add read replica in the same region", "Change DB instance to a supported class (db.t3.small)", "Change to Aurora", "Enable cluster backtracking", "Enable instance backup with retention period 7" ], "haArchitecture": "WarmStandby", "referenceId": "rds:config:aurora-backtracking" } ] }, { "appComponentName": "storageappcomponent-rlb", "recommendationStatus": "BreachedUnattainable", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "storageappcomponent-rlb", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No data loss in your system", "expectedRpoInSecs": 0, "expectedRpoDescription": "No data loss in your system" }, "Hardware": { "expectedComplianceStatus": "PolicyBreached", "expectedRtoInSecs": 2592001, "expectedRtoDescription": "No recovery option configured", "expectedRpoInSecs": 2592001, "expectedRpoDescription": "No recovery option configured" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Time to recover EFS from backup. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Recovery Point Objective for EFS from backups, derived from backup frequency" } }, "optimizationType": "BestAZRecovery", "description": "EFS with backups configured", "suggestedChanges": [ "Add additional availability zone" ], "haArchitecture": "MultiSite", "referenceId": "efs:config:with_backups:2020-04-01" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "storageappcomponent-rlb", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No data loss in your system", "expectedRpoInSecs": 0, "expectedRpoDescription": "No data loss in your system" }, "Hardware": { "expectedComplianceStatus": "PolicyBreached", "expectedRtoInSecs": 2592001, "expectedRtoDescription": "No recovery option configured", "expectedRpoInSecs": 2592001, "expectedRpoDescription": "No recovery option configured" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Time to recover EFS from backup. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Recovery Point Objective for EFS from backups, derived from backup frequency" } }, "optimizationType": "BestAttainable", "description": "EFS with backups configured", "suggestedChanges": [ "Add additional availability zone" ], "haArchitecture": "MultiSite", "referenceId": "efs:config:with_backups:2020-04-01" } ] } ] }