Engineering and operating for resillience in a single Region
Before you dive into multi-Region concepts, start by confirming that your workload is already as resilient as possible in a single Region. To achieve this, evaluate your workload against the reliability pillar and operational excellence pillar of the AWS Well-Architected Framework, and make any necessary changes based on trade-offs and risk assessment. The following concepts are covered in the AWS Well-Architected Framework:
To take single-Region resilience further, review and apply the concepts that are discussed in the paperĀ Advanced Multi-AZ Resilience Patterns: Detecting and Mitigating Gray Failures. This paper provides best practices for using replicas in each Availability Zone to contain failures and expands on multi-AZ concepts that are introduced in the AWS Well Architected Framework. Although a multi-Region architecture can mitigate failure modes that are bound to Availability Zones, there are trade-offs that come with a multi-Region approach that you should consider. That is why we recommend that you start with a multi-AZ approach, and then evaluate a specific workload against fundamentals for multi-Region architectures to determine if a multi-Region approach can increase the workload's resilience.