on
A Study Guide to Confusingly Named Consistency Guarantees
There are a huge number of terms used when describing consistency guarantees in distributed systems. What’s the difference between serializable and linearizable? What’s a ‘dirty read’? How do these concepts map between SQL and NoSQL databases? What, in other words, is Kyle Kingsbury actually talking about? I personally find the concepts very confusing. For one thing, everyone uses slightly different terminology for similar or adjacent concepts. Martin Kleppmann helpfully points out that it’s not just me. The study of consistency guarantees started from an analysis of early databases. The terms reflect their naming conventions, however odd, and isolation levels, however desirable or practical.
Right now I work mostly with messaging middleware. It’s surprising how many conversations I’ve had with people who implicitly assume their messaging solutions offer transactional guarantees. Professionally, I feel like it’s important to get these concepts straight in my own head to be clear when helping others. So here is a quick and dirty study guide of confusingly named consistency models.
ACID transactions
Transactions are the units of work of relational database management systems (RDBMS): sets of read and write operations ending in a commit which saves any changes. Transactions are said to be Atomic, Cconsistent, Isolated, and Durable.
Durability: Starting last, durability is a reference to when data was saved on actual tape. Durability can be approximated to ‘will persist after a shutdown or restart’. In modern databases, replication can also be considered a guarantee of durability.
Atomicity: Transactions are all or nothing. They can have many steps involving a series of reads and writes to different tables in the database before committing. Atomicity guarantees abortability by “manipulating multiple objects as one”. If a transaction fails or aborts, none of the intermediate operations will be persisted.
Consistency: ACID consistency refers more to an accounting guarantee than consistent state. The condition requires that integrity of the system should be maintained after a state change. For instance, after a bank transfer from one account to another you should expect that the net amount in the two accounts stays them same. Seeing the amount arrive in one account before it appears to have been removed from the other (or vice versa) would break consistency.
Isolation: The effect of all the transactions is as if they occured serially (one after the other)… hence the term serializability. Concurrent operations are never visible to the transaction. Instead, transactions are interleaved as if all of their operations started after the ‘previous’ transaction’s one’s operations finished.
Strict Serializability
Serializability relates to the effect of “groups of one or more operations over one or more objects”. In a distributed system there can be different levels of serializability. For instance, consider a database which commits transactions T1 and T2 serially (in that order). The database is backed up asynchronously to a read replica. There are no checks in the replica on order of receipt, T2 arrives before T1, and the transactions are committed in reverse order. Both the database and its replica have committed their transactions in a locally consistent serial ordering, but the replica’s ordering is obviously wrong. “One copy serializability” refers to all replicas of a system acting as if there was one data item (i.e. serial order for any operation is the same regardless of which replica is contacted).
One copy serializable systems can still present anomalies due to clock skew. For instance, a (commited) write with timestamp t=2 to a data object is ignored because an earlier (committed) write’s timestamp is t=3, regardless of the way the user think’s time is supposed to work. It’s also possible for all writes to be overidden by an immortal write, where the system decides one commit is serially after any other future transaction. A system is said to be linearlizable if such time-travel anomalies are not possible:
“Linearizability is a guarantee about single operations on single objects. It provides a real-time (i.e. wall clock) guarantee on the behaviour of a set of single operations (often reads and writes) on a single object (e.g., distributed register or data item). […] Linearizability […] is synonimous with atomic consistency [ (the C in CAP) ]. […] We say linearizability is composable (or “local”) because, if operations on each object in a system are linearizable, then all operations in the system are linearizable.” - Peter Baillis
Strict (or Strong) Serializability refers to systems which are both (one-copy) serializable and linearizable.
There are two main ways of ensuring strict serializability:
- 2-Phase Locking: When data is read, the transaction acquires a shared lock which stops all writes to that data.
- Other transactions can still read (by acquiring their own shared locks of) the same data. A general query can lock the entire database until it is complete.
- Writes take an exclusive lock which stops any other reads (via shared locks) or writes (via other exclusive locks) while it is held.
- In 2-phase locking, all the locks a transaction needs are acquired (phase 1) before any of them are released (phase 2).
- There are several different flavours of 2-phase locking with different rules around when a transaction’s locks can be taken and released within each phase (e.g. in strong strict 2-phase locking, no locks are released until after commit).
- H-Store (VaultDB, 2007): Take all requests and force them to run serially. Effectively make interaction with the DB single-threaded. Requires very fast transactions:
- Data is held in memory (disk reads are too slow)
- Stored procedures are used so each transaction is transmitted in one request
Less-than-strict serializability
Serializability and linearizability require coordination. As a result, a strict serializable system cannot guarantee availabilty in an asynchronous network (CAP Theorem et al.). In order to achieve better availability guarantees, various systems have been designed with more relaxed subsets of conditions than both serializability and linearizability. Jepsen has an incredibly helpful map of these various consistency models.
Relaxing Serializability
I’ll cover serializability models starting with the least strict.
- Read Uncommited prevents dirty writes:
- One transaction commits and overwrite’s an uncommitted transaction’s write to the same data object.
- e.g. T1 writes x=1 -> T2 writes x=4 -> T2 commits -> T1 commits -> x=4
- Read Commmited builds on read uncommited and adds a guarantee against dirty reads:
- One transaction reads writes that another transaction has not yet committed.
- e.g. T1 writes x=1 -> T2 reads x=1 -> T1 commits
- Read committed isolation can fail due to read skew:
- A read transaction concurrent with a write transaction gets an inconsistent state of the data. For instance, one key is read before the write transaction was started and another after.
- e.g. x=1, y=0 -> T1 reads x=1 -> T2 writes x=3 -> T2 writes y=4 -> T2 commits -> T1 reads y=4 -> T1 commits read (x=1, y=4)
- Cursor stability is an extension of read commited where each transaction gets a lock (“cursor”) on every object it reads, preventing concurrent transactions from modifying those objects until the cursor is released.
- Repeatable Read extends cursor stability’s write lock to predicates (i.e. WHERE clauses).
- Snapshot Isolation guarantees that every transaction acts as if it is the only one being conducted on the database for its duration. The transaction gets a point-in-time ‘snapshot’ of the database).
- In multi-version concurrency control (MVCC) snapshot isolation, a database internally keeps multiple version of the same data item and will present each transaction with the one relevant to its ‘time-cut’. Practically, this is achieved by taking non-blocking locks whenever there is a read or write. The locks record any conflicts in data that has been written to during a transaction. At the end of the transaction, the lock ledger is analysed to decide if the transaction is safe. The transaction is aborted atomically if it violates serializability. If there is a lot of contention, there will be many aborts.
- Snapshot isolation can fail to be serializable due to write skew:
- Writes based on out of date read state lead to violations of consistency invariants (e.g. ‘at least 1 instance of x must be true at all times’). “By the time the write is committed, the premise of the decision is no longer true” (Kleppmann). If the writes happen to different places/shards in the database, a straightforward lock won’t stop the database from being put in an inconsistent state.
Relaxing Linearizability
Causality is important for building meaningful applications. However, there are different levels of causal guarantees available:
- Monotonic reads: A process has seen a particular value on a read, no subsequent reads will return an earlier value for the same data. Reads cannot be rewound.
- Monotonic writes: A process’s writes are guaranteed to be persisted in their commit order. If process Pa writes x=1 and (commits and) then writes x=2, all processes reading that data over its history will read them in the correct order.
- Read-your-writes: A process will always read it’s most up to date writes. This is causal consistency for a single process.
- Session causality: A Process that reads a value x=1 from an object and then writes x=2 to it must see its updated write on subsequent reads. In other words, read-your-writes consistency is guaranteed for the duration of a session. Guarantees do not overlap if a session ends.
- Causal Consistency: Processes agree on the order of causally related operations. If Pa confirms a series of writes to Pb, Pb will read the correct value and write history. Pc with no causal relationship to Pa has no such guarantee
- Sequential consistency: All processes agree on a global order of operations. No process will read data that is inconsistent with that order. However, processes don’t necessarily agree on where they are in that ordering. They don’t have to agree on what time ‘it is’.
Linearizability is sequential consistency with a real-time guarantee.
Reaching consensus in distributed systems
These consistency models were developed primarily for single monolithic databases. They’ve been extended in different ways to apply to microservices.
One convention for microservices is to say that there is a transaction boundary around each service (assuming they each have a local database). This approach treats each microservice as its own monolith and doesn’t attempt to make achieve any higher level transactional guarantees.
On the other hand, implementing transactions across a multi-node database requires consensus. The two most common consensus protocols are Raft and Paxos. Sarah Christoff has a great summary of the differences between the two so I won’t try.
Implementing a consensus protocol can be very difficult in practice because they are quite sensitive to latency. For instance, Raft relies on a heartbeat from other nodes to know when to trigger an election. It is generally suggested not to span AZs (or potentially even datacenters) when deploying CP systems which rely on consensus (e.g. Zookeeper) because the latency hit can severly impact their ability to maintain a stable quorum.
Prioritizing Availability
The cost in downtime and coordination effort of guaranteeing consistency is too much for certain types of applications. Firms who’s globe spanning operations require higher availability than consensus-based approaches can provide favour AP systems. Werner Vogels describes these eventuallly consistent models as a means of “building reliable distributed systems at a worldwide scale”. With no subsequent updates, eventually all reads will converge on the same value.
“If no failures occur, the maximum size of the inconsistency window can be determined based on factors such as communication delays, the load on the system, and the number of replicas involved in the replication scheme.” - Werner Vogels
If eventual consistency is a good enough guarantee for an application there are many options for how to reconcile state. However, all options require a more relaxed interpretation of what constitutes the lifetime of a transaction (and, informally, how long it might take to roll back):
- Compensating transactions: rolling back writes after the fact at the app level. Sagas are a pattern for long-lived transactions. They use messsage passing to signal the need to roll back (or ‘commit’) operations from independant ACID transactions on different services.
- Apologies: Reaching out to customers if an invariant has been violated and compensating them for the inconsistency (e.g. offering a gift voucher if the product they ordered is not in fact available)
Server-side Consistency
These consistency models treat the database as a black box. In practice, the number of nodes of a distributed database required to process reads or writes has a meaningful impact on the guarantees it can offer.
N = number of replicas for given data (which will all be eventually updated)W = number of replicas that need to ack a write in order for a write to be committedR = number of replicas that need to ack a (consistent) read in order for a read to be committed(obviously W <= N, R <= N)
- W + R > N: The read and write sets will always overlap. The database will be at least partially unavailable in the event of a partition. The values of W and R determine whether reads or writes (or both) will fail.
- R = 1, W = N: Optimize for reads. Writes will fail if there is a failure or partition.
- W = 1, R = N: Optimize for writes. Reads will fail if there is a failure or partition.
- W < (N + 1)/2: Writes to less than quorum. Write sets for subsequent writes might not overlap. Monotonic writes cannot be guaranteed.
- W + R <= N: Weak/eventual consistency. Reads can occur from a set of nodes that have not seen a write yet.
Round up
Clearly, there are levels to this. Pay attention to the guarantees provided by your database. Be careful when making promises to your customers about what your system guarantees. And be clear about what you can offer. Bear in mind that the way we think about databases and stateful systems in general is as much about their history as the logic that underpins them.
Sources
- Designing Data-Intensive Applications, Martin Kleppmann
- Transactions: myths, surprises and opportunities, Martin Kleppman
- Eventually Consistent - Revisited”, Werner Vogels
- Consistency without consensus in production systems, Peter Bourgon
- An explanation of the difference between Isolation levels vs. Consistency levels, Daniel Abadi
- Correctness Anomalies Under Serializable Isolation, Danile Abadi
- Overview of Consistency Levels in Database Systems, Daniel Abadi
- Consistency Models, Kyle Kingsbury
- Serializability, linearizability, and locality, Kyle Kingsbury
- Strong consistency models, Kyle Kingsbury
- Linearizability versus Serializability, Peter Bailis
- You. Must. Build. A. Raft!, Sarah Christoff
- The morning paper: A Critique of ANSI SQL Isolation Levels, Adrian Colyer (on Berenson et al.)
- Using sagas to maintain data consistency in a microservice architecture, Chris Richardson