Étape 2 : Exécution et gestion des AWS Resilience Hub évaluations de résilience - AWS Centre de résilience

Les traductions sont fournies par des outils de traduction automatique. En cas de conflit entre le contenu d'une traduction et celui de la version originale en anglais, la version anglaise prévaudra.

Étape 2 : Exécution et gestion des AWS Resilience Hub évaluations de résilience

Après avoir publié une nouvelle version de votre application, vous devez exécuter une nouvelle évaluation de la résilience et analyser les résultats pour vous assurer que votre application répond au RTO de charge de travail estimé et au RPO estimés définis dans votre politique de résilience. L'évaluation compare la configuration de chaque composant d'application à la politique et émet des recommandations en matière d'alarme, de procédure opérationnelle normalisée et de test.

Pour plus d'informations, consultez les rubriques suivantes :

Exécution et surveillance des AWS Resilience Hub évaluations de résilience

Pour exécuter des évaluations de résilience AWS Resilience Hub et surveiller leur état, vous devez utiliser les API suivantes :

L'exemple suivant montre comment démarrer une nouvelle évaluation à l'AWS Resilience Hubaide de l'StartAppAssessmentAPI.

Demande

aws resiliencehub start-app-assessment \ --app-arn <App_ARN> \ --app-version release \ --assessment-name first-assessment

Réponse

{ "assessment": { "appArn": "<App_ARN>", "appVersion": "release", "invoker": "User", "assessmentStatus": "Pending", "startTime": "2022-10-27T08:15:10.452000+03:00", "assessmentName": "first-assessment", "assessmentArn": "<Assessment_ARN>", "policy": { "policyArn": "<Policy_ARN>", "policyName": "newPolicy", "dataLocationConstraint": "AnyLocation", "policy": { "AZ": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Hardware": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Software": { "rtoInSecs": 172800, "rpoInSecs": 86400 } } }, "tags": {} } }

L'exemple suivant montre comment surveiller le statut de votre évaluation à l'AWS Resilience Hubaide de l'DescribeAppAssessmentAPI. Vous pouvez extraire le statut de votre évaluation à partir de la assessmentStatus variable.

Demande

aws resiliencehub describe-app-assessment \ --assessment-arn <Assessment_ARN>

Réponse

{ "assessment": { "appArn": "<App_ARN>", "appVersion": "release", "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "resiliencyScore": { "score": 0.27, "disruptionScore": { "AZ": 0.42, "Hardware": 0.0, "Region": 0.0, "Software": 0.38 } }, "compliance": { "AZ": { "achievableRtoInSecs": 0, "currentRtoInSecs": 4500, "currentRpoInSecs": 86400, "complianceStatus": "PolicyMet", "achievableRpoInSecs": 0 }, "Hardware": { "achievableRtoInSecs": 0, "currentRtoInSecs": 2595601, "currentRpoInSecs": 2592001, "complianceStatus": "PolicyBreached", "achievableRpoInSecs": 0 }, "Software": { "achievableRtoInSecs": 0, "currentRtoInSecs": 4500, "currentRpoInSecs": 86400, "complianceStatus": "PolicyMet", "achievableRpoInSecs": 0 } }, "complianceStatus": "PolicyBreached", "assessmentStatus": "Success", "startTime": "2022-10-27T08:15:10.452000+03:00", "endTime": "2022-10-27T08:15:31.883000+03:00", "assessmentName": "first-assessment", "assessmentArn": "<Assessment_ARN>", "policy": { "policyArn": "<Policy_ARN>", "policyName": "newPolicy", "dataLocationConstraint": "AnyLocation", "policy": { "AZ": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Hardware": { "rtoInSecs": 172800, "rpoInSecs": 86400 }, "Software": { "rtoInSecs": 172800, "rpoInSecs": 86400 } } }, "tags": {} } }

Examen des résultats d'évaluation

Une fois votre évaluation terminée avec succès, vous pouvez examiner les résultats de l'évaluation à l'aide des API suivantes.

L'exemple suivant montre comment obtenir les recommandations d'alarme à l'aide de l'Amazon Resource Name (ARN) de l'évaluation à l'aide de l'ListAlarmRecommendationsAPI.

Note

Pour obtenir les recommandations de test SOP et FIS, remplacez par ouListSopRecommendations. ListTestRecommendations

Demande

aws resiliencehub list-alarm-recommendations \ --assessment-arn <Assessment_ARN>

Réponse

{ "alarmRecommendations": [ { "recommendationId": "78ece7f8-c776-499e-baa8-b35f5e8b8ba2", "referenceId": "app_common:alarm:synthetic_canary:2021-04-01", "name": "AWSResilienceHub-SyntheticCanaryInRegionAlarm_2021-04-01", "description": "A monitor for the entire application, configured to constantly verify that the application API/endpoints are available", "type": "Metric", "appComponentName": "appcommon", "items": [ { "resourceId": "us-west-2", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "Make sure CloudWatch Synthetics is setup to monitor the application (see the <a href=\"https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html\" target=\"_blank\">docs</a>). \nMake sure that the Synthetics Name passed in the alarm dimension matches the name of the Synthetic Canary. It Defaults to the name of the application.\n" }, { "recommendationId": "d9c72c58-8c00-43f0-ad5d-0c6e5332b84b", "referenceId": "efs:alarm:percent_io_limit:2020-04-01", "name": "AWSResilienceHub-EFSHighIoAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that reports when EFS I/O load is more than 90% for too much time", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "09f340cd-3427-4f66-8923-7f289d4a3216", "referenceId": "efs:alarm:mount_failure:2020-04-01", "name": "AWSResilienceHub-EFSMountFailureAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that reports when volume failed to mount to EC2 instance", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "* Make sure Amazon EFS utils are installed(see the <a href=\"https://github.com/aws/efs-utils#installation\" target=\"_blank\">docs</a>).\n* Make sure cloudwatch logs are enabled in efs-utils (see the <a href=\"https://github.com/aws/efs-utils#step-2-enable-cloudwatch-log-feature-in-efs-utils-config-file-etcamazonefsefs-utilsconf\" target=\"_blank\">docs</a>).\n* Make sure that you've configured `log_group_name` in `/etc/amazon/efs/efs-utils.conf`, for example: `log_group_name = /aws/efs/utils`.\n* Use the created `log_group_name` in the generated alarm. Find `LogGroupName: REPLACE_ME` in the alarm and make sure the `log_group_name` is used instead of REPLACE_ME.\n" }, { "recommendationId": "b0f57d2a-1220-4f40-a585-6dab1e79cee2", "referenceId": "efs:alarm:client_connections:2020-04-01", "name": "AWSResilienceHub-EFSHighClientConnectionsAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that reports when client connection number deviation is over the specified threshold", "type": "Metric", "appComponentName": "storageappcomponent-rlb", "items": [ { "resourceId": "fs-0487f945c02f17b3e", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "15f49b10-9bac-4494-b376-705f8da252d7", "referenceId": "rds:alarm:health-storage:2020-04-01", "name": "AWSResilienceHub-RDSInstanceLowStorageAlarm_2020-04-01", "description": "Reports when database free storage is low", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "c1906101-cea8-4f77-be7b-60abb07621f5", "referenceId": "rds:alarm:health-connections:2020-04-01", "name": "AWSResilienceHub-RDSInstanceConnectionSpikeAlarm_2020-04-01", "description": "Reports when database connection count is anomalous", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "f169b8d4-45c1-4238-95d1-ecdd8d5153fe", "referenceId": "rds:alarm:health-cpu:2020-04-01", "name": "AWSResilienceHub-RDSInstanceOverUtilizedCpuAlarm_2020-04-01", "description": "Reports when database used CPU is high", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "69da8459-cbe4-4ba1-a476-80c7ebf096f0", "referenceId": "rds:alarm:health-memory:2020-04-01", "name": "AWSResilienceHub-RDSInstanceLowMemoryAlarm_2020-04-01", "description": "Reports when database free memory is low", "type": "Metric", "appComponentName": "databaseappcomponent-hji", "items": [ { "resourceId": "terraform-20220623141426115800000001", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "67e7902a-f658-439e-916b-251a57b97c8a", "referenceId": "ecs:alarm:health-service_cpu_utilization:2020-04-01", "name": "AWSResilienceHub-ECSServiceHighCpuUtilizationAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub that triggers when CPU utilization of ECS tasks of Service exceeds the threshold", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "fb30cb91-1f09-4abd-bd2e-9e8ee8550eb0", "referenceId": "ecs:alarm:health-service_memory_utilization:2020-04-01", "name": "AWSResilienceHub-ECSServiceHighMemoryUtilizationAlarm_2020-04-01", "description": "Alarm by AWS ResilienceHub for Amazon ECS that indicates if the percentage of memory that is used in the service, is exceeding specified threshold limit", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ] }, { "recommendationId": "1bd45a8e-dd58-4a8e-a628-bdbee234efed", "referenceId": "ecs:alarm:health-service_sample_count:2020-04-01", "name": "AWSResilienceHub-ECSServiceSampleCountAlarm_2020-04-01", "description": "Alarm by AWS Resilience Hub for Amazon ECS that triggers if the count of tasks isn't equal Service Desired Count", "type": "Metric", "appComponentName": "computeappcomponent-nrz", "items": [ { "resourceId": "aws_ecs_service_terraform-us-east-1-demo", "targetAccountId": "12345678901", "targetRegion": "us-west-2", "alreadyImplemented": false } ], "prerequisite": "Make sure the Container Insights on Amazon ECS is enabled: (see the <a href=\"https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-ECS-cluster.html\" target=\"_blank\">docs</a>)." } ] }

L'exemple suivant montre comment obtenir les recommandations de configuration (recommandations sur la manière d'améliorer votre résilience actuelle) à l'aide de l'ListAppComponentRecommendationsAPI.

Demande

aws resiliencehub list-app-component-recommendations \ --assessment-arn <Assessment_ARN>

Réponse

{ "componentRecommendations": [ { "appComponentName": "computeappcomponent-nrz", "recommendationStatus": "MetCanImprove", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "LeastCost", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "LeastChange", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 14.74, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "computeappcomponent-nrz", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No expected downtime. You're launching using EC2, with DesiredCount > 1 in multiple AZs and CapacityProviders with MinSize > 1", "expectedRpoInSecs": 0, "expectedRpoDescription": "ECS Service state is saved on EFS file system. No data loss is expected as objects are be stored in multiple AZs." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No expected downtime. You're launching using EC2, with DesiredCount > 1 and CapacityProviders with MinSize > 1", "expectedRpoInSecs": 0, "expectedRpoDescription": "ECS Service state is saved on EFS file system. No data loss is expected as objects are be stored in multiple AZs." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": " Estimated time to restore cluster with volumes. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Based on the frequency of the backups" } }, "optimizationType": "BestAZRecovery", "description": "Stateful ECS service with launch type EC2 and EFS storage, deployed in multiple AZs. AWS Backup is used to backup EFS and copy snapshots in-region.", "suggestedChanges": [ "Add Auto Scaling Groups and Capacity Providers in multiple AZs", "Change desired count of the setup", "Remove EBS volume" ], "haArchitecture": "BackupAndRestore", "referenceId": "ecs:config:ec2-multi_az-efs-backups:2022-02-16" } ] }, { "appComponentName": "databaseappcomponent-hji", "recommendationStatus": "MetCanImprove", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from an RDS backup. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." } }, "optimizationType": "LeastCost", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from an RDS backup. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 1800, "expectedRtoDescription": "Estimated time to restore from snapshot. (Estimates are averages based on size, real time may vary greatly from estimate).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Estimate based on the backup schedule. (Estimates are calculated from backup schedule, real time restore may vary)." } }, "optimizationType": "LeastChange", "description": "Current Configuration", "suggestedChanges": [], "haArchitecture": "BackupAndRestore", "referenceId": "original" }, { "cost": { "amount": 76.73, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "databaseappcomponent-hji", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 120, "expectedRtoDescription": "Estimated time to promote a secondary instance.", "expectedRpoInSecs": 0, "expectedRpoDescription": "Aurora data is automatically replicated across multiple Availability Zones in a Region." }, "Hardware": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 120, "expectedRtoDescription": "Estimated time to promote a secondary instance.", "expectedRpoInSecs": 0, "expectedRpoDescription": "Aurora data is automatically replicated across multiple Availability Zones in a Region." }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Estimate time to backtrack to a stable state.", "expectedRpoInSecs": 300, "expectedRpoDescription": "Estimate for latest restorable time for point in time recovery." } }, "optimizationType": "BestAZRecovery", "description": "Aurora database cluster with one read replica, with backtracking window of 24 hours.", "suggestedChanges": [ "Add read replica in the same region", "Change DB instance to a supported class (db.t3.small)", "Change to Aurora", "Enable cluster backtracking", "Enable instance backup with retention period 7" ], "haArchitecture": "WarmStandby", "referenceId": "rds:config:aurora-backtracking" } ] }, { "appComponentName": "storageappcomponent-rlb", "recommendationStatus": "BreachedUnattainable", "configRecommendations": [ { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "storageappcomponent-rlb", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No data loss in your system", "expectedRpoInSecs": 0, "expectedRpoDescription": "No data loss in your system" }, "Hardware": { "expectedComplianceStatus": "PolicyBreached", "expectedRtoInSecs": 2592001, "expectedRtoDescription": "No recovery option configured", "expectedRpoInSecs": 2592001, "expectedRpoDescription": "No recovery option configured" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Time to recover EFS from backup. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Recovery Point Objective for EFS from backups, derived from backup frequency" } }, "optimizationType": "BestAZRecovery", "description": "EFS with backups configured", "suggestedChanges": [ "Add additional availability zone" ], "haArchitecture": "MultiSite", "referenceId": "efs:config:with_backups:2020-04-01" }, { "cost": { "amount": 0.0, "currency": "USD", "frequency": "Monthly" }, "appComponentName": "storageappcomponent-rlb", "recommendationCompliance": { "AZ": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 0, "expectedRtoDescription": "No data loss in your system", "expectedRpoInSecs": 0, "expectedRpoDescription": "No data loss in your system" }, "Hardware": { "expectedComplianceStatus": "PolicyBreached", "expectedRtoInSecs": 2592001, "expectedRtoDescription": "No recovery option configured", "expectedRpoInSecs": 2592001, "expectedRpoDescription": "No recovery option configured" }, "Software": { "expectedComplianceStatus": "PolicyMet", "expectedRtoInSecs": 900, "expectedRtoDescription": "Time to recover EFS from backup. (Estimate is based on averages, real time restore may vary).", "expectedRpoInSecs": 86400, "expectedRpoDescription": "Recovery Point Objective for EFS from backups, derived from backup frequency" } }, "optimizationType": "BestAttainable", "description": "EFS with backups configured", "suggestedChanges": [ "Add additional availability zone" ], "haArchitecture": "MultiSite", "referenceId": "efs:config:with_backups:2020-04-01" } ] } ] }