Consistency models
Why this post?
I have found somehow difficult to find an authoritative source which describes what is the contract that a certain consistency model defines with the programmer. Moreover, some resources seem to contradict each other.
The idea of this post is to document my findings around some of the most common consistency models that can be found in distributed systems which we interact with. When possible I'll add the resource where I found the definition.
Consistency model as abstraction
In distributed systems, a consistency model is a contract between the system and the developer who uses it. A system is said to support a certain consistency model if operations on memory respect the rules defined by the model.
They are a powerful abstraction which helps to describe a system in terms of its observable properties.
From a practical point of view, a consistency model answers to questions like: If Client 1 adds an object to a distributed data store at 1:00 pm and Client 2 tries to read the same object at 1:01 pm, will Client 2 be able to get the new object? Or will it get a 404 because the resource was not found?
The answer is that...it depends. There are many factors to consider to answer such a question. One of them is the consistency model adopted by the distributed data store.
Sequential consistency
The result of any execution is the same as if the operations of all processors were executed in some sequential order and the operations of each individual processor appear in this sequence in the order specified by its program.
Conceptually there is one global memory and a switch that connects an arbitrary processor to the memory at any time. Each processor issues memory operations in program order and the switch provides the global serialization among all the processors.
Therefore: Sequential consistency is not deterministic because multiple execution of the distributed program might lead to a different order of operations.
In the example above, four possible execution orders are possible no matter what is the actual order in which the operations were issued.
Important: no matter what the order of the operations are, it is guaranteed that is the same among all the processors. This also means that each process will observe the same history of operations. For example, in the first case b.1
will read the state which is the result of all the operations completed before b.1
: so b.1
is going to read a value for the account which is '150'.
Strict consistency
Also called Strong consistency or Linearizability.
Like sequential consistency, but the execution order of programs between processors must be the order in which those operations were issued.
As analogy, this is pretty much what we would get from a multi threaded program in which there is no cache between the cores and the main memory, and neither the compiler nor the CPU is allowed to reorder instructions.
Therefore: If each program in each processor is deterministic, then the distributed program is deterministic.
Causal consistency
Relax sequential consistency allowing also reordering of the operations in the same processors when they are not 'causally related'.
PRAM consistency
The write operations (and only the writes) of each processor appear in the order specified by the program.
Serializability
Unlike the models described above, serializability defines guarantees on group of operations (called transactions). It guaranties the execution of a set of transactions (usually read and writes operations) over multiple items is equivalent to some
serial execution of the transactions.
Unlike linearizability, serializability does not impose a deterministic order, in this sense is similar to sequential consistency but for a group of operations.
Strict serializability, instead, guaranties the same constraint expressed by linearizability but for transactions: transactions must be executed in the order they were issued.
The definition looks a bit different from the ones above because it comes from database people, while sequential consistency, etc. come from distributed system people.
Read after write
The model guaranties immediate visibility of updates to an item to all the clients. The concept is similar to the one of Memory Barrier present in modern processors.
Note: Sometimes what is called Read after write in some systems, it's actually Read after create (for example in S3).
Read after create
The model guaranties immediate visibility of newly created data to all the clients. Updates to an existing object might not be immediately visible to all clients.
Conclusions
Those were my two cents about one of the many aspects around distributed systems. I'll keep adding more to the list as I go. Of course, any feedback would be greatly appreciated!
Top comments (12)
I had a hard time to understand why should I read the article after the first part, I think an intro would be good,what are distributed systems,who makes them, who uses this patterns with real examples.
I think I understood the patterns but not why they exists, where can I use them etc.
Hi! thanks for the feedback.
I changed a bit the introduction based on your input.
The article is not supposed to be an introduction to distributed systems, instead it aims to document a specific aspect of such systems, which are consistency models.
Can Sequential consistency see out of date data?
No, under sequential consistency each processor(or node) observes the same history of operations. This means that each node is gonna read fresh state. (Fresh here means that the read operations read the last write. Note that the info might become outdated right after)
Thank you very much for your quick reply. Each processor(or node) observes the same history of operations, maybe the client read the stale data from different processor, for example, a client writes a variable x=0 through process 1, and then writes a variable y=0, and process 1 propagate these events to the process 2, If the client initiates a read operation before the event reaches process 2, will it read the outdated data?
U re welcome :)
what you described in your example cannot happen under sequential consistency.
The model is an abstract contract of the behaviour the system must respect at any point in time.
Coming back to your example you, just to make it a bit more concrete, we could:
Both these approaches will cause the clients to observe a system which behaves as sequential consistent.
Thank you very much. Your answer has solved my doubts. : -)
Glad it helped :)
Hello! Loved the post! I was wondering about how to implement sequential consistency in a system that can accept write requests from multiple processes simultaneously, like multi leader replication system? Thanks.
Hi, thanks. What do you want to build?
Depending on the system and the database you plan to use, some sort of session affinity could be enough to guarantee sequential consistency. Session affinity in this context means that all the reads/writes coming from one process are going to be executed by the same node in the order in which they were issued.
Anyway, I do not have all the info you have - without that it is hard for me to give a more concrete answer.
you mentioned about "casual consistency", is it supposed to be "causal consistency" ?
Yes, it s a typo. Thanks for reporting it