REL01-BP02 Manage service quotas across accounts and regions - Reliability Pillar

REL01-BP02 Manage service quotas across accounts and regions

If you are using multiple accounts or Regions, request the appropriate quotas in all environments in which your production workloads run.

Desired outcome: Services and applications should not be affected by service quota exhaustion for configurations that span accounts or Regions or that have resilience designs using zone, Region, or account failover.

Common anti-patterns:

  • Allowing resource usage in one isolation Region to grow with no mechanism to maintain capacity in the other ones.

  • Manually setting all quotas independently in isolation Regions.

  • Not considering the effect of resiliency architectures (like active or passive) in future quota needs during a degradation in the non-primary Region.

  • Not evaluating quotas regularly and making necessary changes in every Region and account the workload runs.

  • Not leveraging quota request templates to request increases across multiple Regions and accounts.

  • Not updating service quotas due to incorrectly thinking that increasing quotas has cost implications like compute reservation requests.

Benefits of establishing this best practice: Verifying that you can handle your current load in secondary regions or accounts if regional services become unavailable. This can help reduce the number of errors or levels of degradations that occur during region loss.

Level of risk exposed if this best practice is not established: High

Implementation guidance

Service quotas are tracked per account. Unless otherwise noted, each quota is AWS Region-specific. In addition to the production environments, also manage quotas in all applicable non-production environments so that testing and development are not hindered. Maintaining a high degree of resiliency requires that service quotas are assessed continually (whether automated or manual).

With more workloads spanning Regions due to the implementation of designs using Active/Active, Active/Passive – Hot, Active/Passive-Cold, and Active/Passive-Pilot Light approaches, it is essential to understand all Region and account quota levels. Past traffic patterns are not always a good indicator if the service quota is set correctly.

Equally important, the service quota name limit is not always the same for every Region. In one Region, the value could be five, and in another region the value could be ten. Management of these quotas must span all the same services, accounts, and Regions to provide consistent resilience under load.

Reconcile all the service quota differences across different Regions (Active Region or Passive Region) and create processes to continually reconcile these differences. The testing plans of passive Region failovers are rarely scaled to peak active capacity, meaning that game day or table top exercises can fail to find differences in service quotas between Regions and also then maintain the correct limits.

Service quota drift, the condition where service quota limits for a specific named quota is changed in one Region and not all Regions, is very important to track and assess. Changing the quota in Regions with traffic or potentially could carry traffic should be considered.

  • Select relevant accounts and Regions based on your service requirements, latency, regulatory, and disaster recovery (DR) requirements.

  • Identify service quotas across all relevant accounts, Regions, and Availability Zones. The limits are scoped to account and Region. These values should be compared for differences.

Implementation steps

  • Review Service Quotas values that might have breached beyond the a risk level of usage. AWS Trusted Advisor provides alerts for 80% and 90% threshold breaches.

  • Review values for service quotas in any Passive Regions (in an Active/Passive design). Verify that load will successfully run in secondary Regions in the event of a failure in the primary Region.

  • Automate assessing if any service quota drift has occurred between Regions in the same account and act accordingly to change the limits.

  • If the customer Organizational Units (OU) are structured in the supported manner, service quota templates should be updated to reflect changes in any quotas that should be applied to multiple Regions and accounts.

    • Create a template and associate Regions to the quota change.

    • Review all existing service quota templates for any changes required (Region, limits, and accounts).

Resources

Related best practices:

Related documents:

Related videos:

Related services: