CAP theorem - Availability and Beyond: Understanding and Improving the Resilience of Distributed Systems on AWS

CAP theorem

Another way that we might think about availability is in relation to the CAP theorem. The theorem states that a distributed system, one made up of multiple nodes storing data, cannot simultaneously provide more than two out of the following three guarantees:

  • Consistency: Every read request receives the most recent write or an error when consistency can’t be guaranteed.

  • Availability: Every request receives a non-error response, even when nodes are down or unavailable.

  • Partition tolerance: The system continues to operate despite the loss of an arbitrary number of messages between nodes.

(For more details, see Seth Gilbert and Nancy Lynch, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51–59.)

Most distributed systems have to tolerate network failures, and thus, network partitioning has to be allowed. This means that these workloads have to make a choice between consistency and availability when a network partition occurs. If the workload chooses availability, then it always returns a response, but with potentially inconsistent data. If it chooses consistency, then during a network partition it would return an error since the workload can’t be sure about the consistency of the data.

For workloads whose goal it is to provide higher levels of availability, they might choose Availability and Partition tolerance (AP) to prevent returning errors (being unavailable) during a network partition. This results in requiring a more relaxed consistency model, like eventual consistency or monotonic consistency.