Overview of global tables - AWS Prescriptive Guidance

Overview of global tables

Key facts

  • There are two versions of global tables: version 2017.11.29 (legacy) (sometimes called v1) and version 2019.11.21 (current) (sometimes called v2). This guide focuses exclusively on the current version.

  • DynamoDB (without global tables) is a Regional service, which means that it is highly available and intrinsically resilient to failures of infrastructure, including the failure of an entire Availability Zone. A single-Region DynamoDB table is designed for 99.99% availability. For more information, see the DynamoDB service-level agreement (SLA).

  • A DynamoDB global table replicates its data between two or more Regions. A multi-Region DynamoDB table is designed for 99.999% availability. With proper planning, global tables can help create an architecture that is resilient to Regional failures.

  • Global tables employ an active-active replication model. From the perspective of DynamoDB, the table in each Region has equal standing to accept read and write requests. After receiving a write request, the local replica table replicates the write operation to other participating remote Regions in the background.

  • Items are replicated individually. Items that are updated within a single transaction might not be replicated together.

  • Each table partition in the source Region replicates its write operations in parallel with every other partition. The sequence of write operations within a remote Region might not match the sequence of write operations that happened within the source Region. For more information about table partitions, see the blog post Scaling DynamoDB: How partitions, hot keys, and split for heat impact performance.

  • A newly written item is usually propagated to all replica tables within a second. Nearby Regions tend to propagate faster.

  • Amazon CloudWatch provides a ReplicationLatency metric for each Region pair. It is calculated by looking at arriving items, comparing their arrival time with their initial write time, and computing an average. Timings are stored within CloudWatch in the source Region. Viewing the average and maximum timings can be useful for determining the average and worst-case replication lag. There is no SLA on this latency.

  • If an individual item is updated at about the same time (within this ReplicationLatency window) in two different Regions, and the second write operation happens before the first write operation was replicated, there’s a potential for write conflicts. Global tables resolve such conflicts by using a last writer wins mechanism, based on the timestamp of the write operations. The first operation “loses” to the second operation. These conflicts aren’t recorded in CloudWatch or AWS CloudTrail.

  • Each item has a last write timestamp held as a private system property. The last writer wins approach is implemented by using a conditional write operation that requires the incoming item’s timestamp to be greater than the existing item’s timestamp.

  • A global table replicates all items to all participating Regions. If you want to have different replication scopes, you can create multiple global tables and assign each table different participating Regions.

  • The local Region accepts write operations even if the replica Region is offline or ReplicationLatency grows. The local table continues to attempt replicating items to the remote table until each item succeeds.

  • In the unlikely event that a Region goes fully offline, when it comes back online later, all pending outbound and inbound replications will be retried. No special action is required to bring the tables back in sync. The last writer wins mechanism ensures that the data eventually becomes consistent.

  • You can add a new Region to a DynamoDB table at any time. DynamoDB handles the initial sync and ongoing replication. You can also remove a Region (even the original Region), and this will delete the local table in that Region.

  • DynamoDB does not have a global endpoint. All requests are made to a Regional endpoint that accesses the global table instance that’s local to that Region.

  • Calls to DynamoDB should not go across Regions. The best practice is for an application that is homed to one Region to directly access only the local DynamoDB endpoint for its Region. If problems are detected within a Region (in the DynamoDB layer or in the surrounding stack), end user traffic should be routed to a different application endpoint that’s hosted in a different Region. Global tables ensure that the application homed in every Region has access to the same data.

Use cases

Global tables provide these common benefits:

  • Lower-latency read operations. You can place a copy of the data closer to the end user to reduce network latency during read operations. The data is kept as fresh as the ReplicationLatency value.

  • Lower-latency write operations. An end user can write to a nearby Region to reduce network latency and the time to complete the write operation. The write traffic must be carefully routed to ensure that there are no conflicts. Techniques for routing are discussed in a later section.

  • Increased resiliency and disaster recovery. If a Region has degraded performance or a full outage, you can evacuate it (move away some or all requests going to that Region) and meet a recovery point objective (RPO) and recovery time objective (RTO) measured in seconds. Using global tables also increases the DynamoDB SLA for monthly uptime percentage from 99.99% to 99.999%.

  • Seamless Region migration. You can add a new Region and then delete the old Region to migrate a deployment from one Region to another, without any downtime at the data layer.

For example, Fidelity Investments presented at re:Invent 2022 on how they use DynamoDB global tables for their Order Management System. Their goal was to achieve reliably low latency processing at a scale they couldn’t attain with on-premises processing while also maintaining resilience to Availability Zone and Regional failures.