Let's assume that you are working as an Engineer designing some new kind of database, what's the most important thing that you need to make sure exists in your database system for it to be practical? Well it would have to be the ACID properties. Even among the ACID properties the two most important ones are atomicity and isolation. This is exactly why we need concurrency control. It helps us maintain these two properties in Postgres.
There are three broad concurrency control techniques: Multi-version Concurrency Control (MVCC), Strict Two-Phase Locking (S2PL), and Optimistic Concurrency Control (OCC), each with its own set of variations as well. Needless to say this might get a little complicated here but I'll try my best to keep it simple. MVCC is simple; there's two versions when writing, the old version and the new version and both are maintained, readers don’t block writers, and writers don’t block readers. S2PL is a locking based mechanism instead where the one who has the lock at a particular time can perform and operation only while others wait to attain the lock. Snapshot Isolation (SI) is a complex version of MVCC used by Postgres. It's essentially just like having a backup always and when someone wants to read data, we lookup the transaction id, apply some visibility checks and see where should we read from; the backup or the updated data.
Transaction IDs serve as unique identifier for the the transactions. There are special txids as well indicating things like frozen transactions and so on.
Heap Tuple
This a very important data structure that stores a bunch of relevant information such as which transaction id updated the tuple, which transaction id inserted the tuple, how many commands were executed, etc. Insertion, Deletion, and Updating is made possible by using this data structure and manipulating its fields to maintain consistency.
When inserting an index tuple, PostgreSQL uses the FSM (Free Space Map) of the corresponding table or index to select the page which can be inserted it. Commit log holds the statuses of all the transactions. There's four statuses: IN_PROGRESS, COMMITTED, ABORTED, and SUB_COMMITTED.
Transaction Snapshot
It is the snapshot of all the transactions and their statuses at a particular point in time.
Visibility Rule Checks
In this part we have all the things mentioned above in the article and based on certain rules we can check whether a transaction is visible or invisible. Let's explore Rule 1 only to get the idea: Rule 1: If Status(t_xmin) = ABORTED ⇒ Invisible. Now since tx_min deals with insertion, if insertion is aborted it makes sense that the transaction becomes invisible.
Serializable Snapshot Isolation
In order to prevent anomalies like Dirty Reads, Non-Repeatable Reads, and Phantom Reads, SSI is used. It uses a graph based approach and tries to make sure no cycle is produced.
Top comments (0)