ステップ 2: AWS Resilience Hub 障害耐性評価の実行と管理 - AWS レジリエンスハブ


AWS Resilience Hub 障害耐性評価の実行と監視

で障害耐性評価を実行し AWS Resilience Hub 、そのステータスをモニタリングするには、次の を使用する必要がありますAPIs。

次の例は、 を使用して で AWS Resilience Hub 新しい評価の実行を開始する方法を示していますStartAppAssessmentAPI。


aws resiliencehub start-app-assessment \ --app-arn <App_ARN> \ --app-version release \ --assessment-name first-assessment


{ "assessment": { "appArn": "<App_ARN>", "appVersion": "release", "invoker": "User", "assessmentStatus": "Pending", "startTime": "2022-10-27T08:15:10.452000+03:00", "assessmentName": "first-assessment", "assessmentArn": "<Assessment_ARN>", "policy": { "policyArn": "<Policy_ARN>", "policyName": "newPolicy", "dataLocationConstraint": "AnyLocation", "policy": { "AZ": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Hardware": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Software": { "rtoInSecs": 172800, "rpoInSecs": 86400 } } }, "tags": {} } }

次の例は、 AWS Resilience Hub を使用して DescribeAppAssessment で評価のステータスをモニタリングする方法を示していますAPI。assessmentStatus変数から評価のステータスを抽出できます。


aws resiliencehub describe-app-assessment \ --assessment-arn <Assessment_ARN>


{ "assessment": { "appArn": "<App_ARN>", "appVersion": "release", "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "resiliencyScore": { "score": 0.27, "disruptionScore": { "AZ": 0.42, "Hardware": 0.0, "Region": 0.0, "Software": 0.38 } }, "compliance": { "AZ": { "achievableRtoInSecs": 0, "currentRtoInSecs": 4500, "currentRpoInSecs": 86400, "complianceStatus": "PolicyMet", "achievableRpoInSecs": 0 }, "Hardware": { "achievableRtoInSecs": 0, "currentRtoInSecs": 2595601, "currentRpoInSecs": 2592001, "complianceStatus": "PolicyBreached", "achievableRpoInSecs": 0 }, "Software": { "achievableRtoInSecs": 0, "currentRtoInSecs": 4500, "currentRpoInSecs": 86400, "complianceStatus": "PolicyMet", "achievableRpoInSecs": 0 } }, "complianceStatus": "PolicyBreached", "assessmentStatus": "Success", "startTime": "2022-10-27T08:15:10.452000+03:00", "endTime": "2022-10-27T08:15:31.883000+03:00", "assessmentName": "first-assessment", "assessmentArn": "<Assessment_ARN>", "policy": { "policyArn": "<Policy_ARN>", "policyName": "newPolicy", "dataLocationConstraint": "AnyLocation", "policy": { "AZ": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Hardware": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Software": { "rtoInSecs": 172800, "rpoInSecs": 86400 } } }, "tags": {} } }


評価が正常に完了したら、次の を使用して評価結果を確認できますAPIs。

  • DescribeAppAssessment – API これにより、障害耐性ポリシーに照らしてアプリケーションの現在のステータスを追跡できます。さらに、complianceStatus 変数からコンプライアンスステータスを抽出したり、resiliencyScore 構造から各中断タイプの障害耐性スコアを抽出したりすることもできます。この の詳細については、API「」を参照してくださいhttps://docs.aws.amazon.com/resilience-hub/latest/APIReference/API_DescribeAppAssessment.html

  • ListAlarmRecommendations – API これにより、評価の Amazon リソースネーム (ARN) を使用してアラームのレコメンデーションを取得できます。この の詳細についてはAPI、「」を参照してくださいhttps://docs.aws.amazon.com/resilience-hub/latest/APIReference/API_ListAlarmRecommendations.html


    SOP および FISテストの推奨事項を取得するには、 ListSopRecommendationsおよび ListTestRecommendations を使用しますAPIs。

次の例は、 を使用して評価の Amazon リソースネーム (ARN) ListAlarmRecommendations を使用してアラームレコメンデーションを取得する方法を示していますAPI。


SOP および FISテストの推奨事項を取得するには、 を ListSopRecommendationsまたは に置き換えますListTestRecommendations


aws resiliencehub list-alarm-recommendations \ --assessment-arn <Assessment_ARN>


{ "alarmRecommendations": [ { "recommendationId": "78ece7f8-c776-499e-baa8-b35f5e8b8ba2", "referenceId": "app_common:alarm:synthetic_canary:2021-04-01", "name": "AWSResilienceHub-SyntheticCanaryInRegionAlarm_2021-04-01", "description": "A monitor for the entire application, configured to constantly verify that the application API/endpoints are available", "type": "Metric", "appComponentName": "appcommon", "items": [ { "resourceId": "us-west-2", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "Make sure Amazon CloudWatch Synthetics is setup to monitor the application (see the <a href=\"https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html\" target=\"_blank\">docs</a>). \nMake sure that the Synthetics Name passed in the alarm dimension matches the name of the Synthetic Canary. It Defaults to the name of the application.\n" }, { "recommendationId": "d9c72c58-8c00-43f0-ad5d-0c6e5332b84b", "referenceId": "efs:alarm:percent_io_limit:2020-04-01", "name": "AWSResilienceHub-EFSHighIoAlarm_2020-04-01", "description": "An alarm by AWS Resilience Hub that reports when Amazon EFS I/O load is more than 90% for too much time", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "09f340cd-3427-4f66-8923-7f289d4a3216", "referenceId": "efs:alarm:mount_failure:2020-04-01", "name": "AWSResilienceHub-EFSMountFailureAlarm_2020-04-01", "description": "An alarm by AWS Resilience Hub that reports when volume failed to mount to EC2 instance", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "* Make sure Amazon EFS utils are installed(see the <a href=\"https://github.com/aws/efs-utils#installation\" target=\"_blank\">docs</a>).\n* Make sure cloudwatch logs are enabled in efs-utils (see the <a href=\"https://github.com/aws/efs-utils#step-2-enable-cloudwatch-log-feature-in-efs-utils-config-file-etcamazonefsefs-utilsconf\" target=\"_blank\">docs</a>).\n* Make sure that you've configured `log_group_name` in `/etc/amazon/efs/efs-utils.conf`, for example: `log_group_name = /aws/efs/utils`.\n* Use the created `log_group_name` in the generated alarm. Find `LogGroupName: REPLACE_ME` in the alarm and make sure the `log_group_name` is used instead of REPLACE_ME.\n" }, { "recommendationId": "b0f57d2a-1220-4f40-a585-6dab1e79cee2", "referenceId": "efs:alarm:client_connections:2020-04-01", "name": "AWSResilienceHub-EFSHighClientConnectionsAlarm_2020-04-01", "description": "An alarm by AWS Resilience Hub that reports when client connection number deviation is over the specified threshold", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "15f49b10-9bac-4494-b376-705f8da252d7", "referenceId": "rds:alarm:health-storage:2020-04-01", "name": "AWSResilienceHub-RDSInstanceLowStorageAlarm_2020-04-01", "description": "Reports when database free storage is low", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "c1906101-cea8-4f77-be7b-60abb07621f5", "referenceId": "rds:alarm:health-connections:2020-04-01", "name": "AWSResilienceHub-RDSInstanceConnectionSpikeAlarm_2020-04-01", "description": "Reports when database connection count is anomalous", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "f169b8d4-45c1-4238-95d1-ecdd8d5153fe", "referenceId": "rds:alarm:health-cpu:2020-04-01", "name": "AWSResilienceHub-RDSInstanceOverUtilizedCpuAlarm_2020-04-01", "description": "Reports when database used CPU is high", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "69da8459-cbe4-4ba1-a476-80c7ebf096f0", "referenceId": "rds:alarm:health-memory:2020-04-01", "name": "AWSResilienceHub-RDSInstanceLowMemoryAlarm_2020-04-01", "description": "Reports when database free memory is low", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "67e7902a-f658-439e-916b-251a57b97c8a", "referenceId": "ecs:alarm:health-service_cpu_utilization:2020-04-01", "name": "AWSResilienceHub-ECSServiceHighCpuUtilizationAlarm_2020-04-01", "description": "An alarm by AWS Resilience Hub that triggers when CPU utilization of ECS tasks of Service exceeds the threshold", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "fb30cb91-1f09-4abd-bd2e-9e8ee8550eb0", "referenceId": "ecs:alarm:health-service_memory_utilization:2020-04-01", "name": "AWSResilienceHub-ECSServiceHighMemoryUtilizationAlarm_2020-04-01", "description": "An alarm by AWS Resilience Hub for Amazon ECS that indicates if the percentage of memory that is used in the service, is exceeding specified threshold limit", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "1bd45a8e-dd58-4a8e-a628-bdbee234efed", "referenceId": "ecs:alarm:health-service_sample_count:2020-04-01", "name": "AWSResilienceHub-ECSServiceSampleCountAlarm_2020-04-01", "description": "An alarm by AWS Resilience Hub for Amazon ECS that triggers if the count of tasks isn't equal Service Desired Count", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "Make sure the Container Insights on Amazon ECS is enabled: (see the <a href=\"https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-ECS-cluster.html\" target=\"_blank\">docs</a>)." } ] }

次の例は、 を使用して設定の推奨事項 (現在の障害耐性を向上させる方法に関する推奨事項) を取得する方法を示していますListAppComponentRecommendationsAPI。


aws resiliencehub list-app-component-recommendations \ --assessment-arn <Assessment_ARN>


{ "componentRecommendations": [ { "appComponentName": "computeappcomponent-nrz", "recommendationStatus": "MetCanImprove", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "LeastCost", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "LeastChange", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 14.74, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No expected downtime. You're launching using EC2, with DesiredCount > 1 in multiple AZs and CapacityProviders with MinSize > 1", "expectedRpoInSecs": 0, "expectedRpoDescription": "ECS Service state is saved on Amazon EFS file system. No data loss is expected as objects are be stored in multiple AZs." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No expected downtime. You're launching using EC2, with DesiredCount > 1 and CapacityProviders with MinSize > 1", "expectedRpoInSecs": 0, "expectedRpoDescription": "ECS Service state is saved on Amazon EFS file system. No data loss is expected as objects are be stored in multiple AZs." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "BestAZRecovery", "description": "Stateful Amazon ECS service with launch type Amazon EC2 and Amazon EFS storage, deployed in multiple AZs. AWS Backup is used to backup Amazon EFS and copy snapshots in-Region.", "suggestedChanges": [ "Add AWS Auto Scaling Groups and Capacity Providers in multiple AZs", "Change desired count of the setup", "Remove Amazon EBS volume" ], "haArchitecture": "BackupAndRestore", "referenceId": "ecs:config:ec2-multi_az-efs-backups:2022-02-16" } ] }, { "appComponentName": "databaseappcomponent-hji", "recommendationStatus": "MetCanImprove", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from an RDS backup. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." } }, "optimizationType": "LeastCost", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from an RDS backup. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." } }, "optimizationType": "LeastChange", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 76.73, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 120, "expectedRtoDescription": "Estimated time to promote a secondary instance.", "expectedRpoInSecs": 0, "expectedRpoDescription": "Aurora data is automatically replicated across multiple Availability Zones in a Region." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 120, "expectedRtoDescription": "Estimated time to promote a secondary instance.", "expectedRpoInSecs": 0, "expectedRpoDescription": "Aurora data is automatically replicated across multiple Availability Zones in a Region." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Estimate time to backtrack to a stable state.", "expectedRpoInSecs": 300, "expectedRpoDescription": "Estimate for latest restorable time for point in time recovery." } }, "optimizationType": "BestAZRecovery", "description": "Aurora database cluster with one read replica, with backtracking window of 24 hours.", "suggestedChanges": [ "Add read replica in the same Region", "Change DB instance to a supported class (db.t3.small)", "Change to Aurora", "Enable cluster backtracking", "Enable instance backup with retention period 7" ], "haArchitecture": "WarmStandby", "referenceId": "rds:config:aurora-backtracking" } ] }, { "appComponentName": "storageappcomponent-rlb", "recommendationStatus": "BreachedUnattainable", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "storageappcomponent-rlb", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No data loss in your system", "expectedRpoInSecs": 0, "expectedRpoDescription": "No data loss in your system" }, "Hardware": { "expectedComplianceStatus": "PolicyBreached", "expectedRtoInSecs": 2592001, "expectedRtoDescription": "No recovery option configured", "expectedRpoInSecs": 2592001, "expectedRpoDescription": "No recovery option configured" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Time to recover Amazon EFS from backup. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Recovery Point Objective for Amazon EFS from backups, derived from backup frequency" } }, "optimizationType": "BestAZRecovery", "description": "Amazon EFS with backups configured", "suggestedChanges": [ "Add additional availability zone" ], "haArchitecture": "MultiSite", "referenceId": "efs:config:with_backups:2020-04-01" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "storageappcomponent-rlb", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No data loss in your system", "expectedRpoInSecs": 0, "expectedRpoDescription": "No data loss in your system" }, "Hardware": { "expectedComplianceStatus": "PolicyBreached", "expectedRtoInSecs": 2592001, "expectedRtoDescription": "No recovery option configured", "expectedRpoInSecs": 2592001, "expectedRpoDescription": "No recovery option configured" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Time to recover Amazon EFS from backup. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Recovery Point Objective for Amazon EFS from backups, derived from backup frequency" } }, "optimizationType": "BestAttainable", "description": "Amazon EFS with backups configured", "suggestedChanges": [ "Add additional availability zone" ], "haArchitecture": "MultiSite", "referenceId": "efs:config:with_backups:2020-04-01" } ] } ] }