A brand new job
Imagine you are a brand new Data Analyst at The Coffee Company.
Like every Data Analyst, you know SQL well and are in charge of writing new queries to understand the business better.
You want to discover the database and decides to write a bunch of transaction to gather information.
All you know is that your company uses a PostgreSQL database.
Discovering data
You want to discover the coffee table, then do other queries and, eventually, display the data from the beginning again at the end.
BEGIN;
SELECT * FROM coffees
...// some queries without updating anything
SELECT * FROM coffees
COMMIT;
BEGIN
coffee_id | name | unit_price | country
-----------+---------------------+------------+------------
1 | Black Honey | 14.4 | Costa Rica
3 | Las Fincas Granadas | 13.3 | Costa Rica
2 | Ambela Hakala | 15.0 | Ethiopia
(3 rows)
coffee_id | name | unit_price | country
-----------+---------------------+------------+------------
1 | Black Honey | 14.4 | Costa Rica
3 | Las Fincas Granadas | 13.3 | Costa Rica
2 | Ambela Hakala | 17.0 | Ethiopia
(3 rows)
COMMIT;
Surprisingly Ambela Hakala's price changed, but we did not do any updates.
And there is no doubt this is not another coffee with the same name since the coffee_id is the same...
Isolation Principle
Data should stay consistent and not change from one query to another if there is no update or insert in your transaction.
The main asset of a relational database is consistency; the Isolation principle of ACID ensures that your data should not be changed in a transaction.
Can you still trust your database?
Concurrency issue: Unrepeatable Read
What happens here is that You have experienced the unrepeatable read issue.
Indeed there are a hundred different parallel connections to your database. Hence there are a hundred other queries and potential update or insert done in parallel.
Hemileia vastatrix
What You did not know, several months ago, a fungus named Hemileia vastatrix, which causes Coffee Leaf rust, destroyed several coffee plants leading to a fall in coffee production and hence the price raised.
Your colleague Bob, whom you have yet to learn, has been in charge of changing the price of Ambela Hakala to align with the market price.
If we wanted to write only one query with your query and Bob's query to see something similar to what the database sees, we would write :
BEGIN;
SELECT * FROM coffees
... // Some Queries without editing anything
UPDATE coffees set unit_price = 17.0 where name like 'Am%'
SELECT * FROM coffees
COMMIT;
BEGIN
coffee_id | name | unit_price | country
-----------+---------------------+------------+------------
1 | Black Honey | 14.4 | Costa Rica
3 | Las Fincas Granadas | 13.3 | Costa Rica
2 | Ambela Hakala | 15.0 | Ethiopia
(3 rows)
coffee_id | name | unit_price | country
-----------+---------------------+------------+------------
1 | Black Honey | 14.4 | Costa Rica
3 | Las Fincas Granadas | 13.3 | Costa Rica
2 | Ambela Hakala | 17.0 | Ethiopia
(3 rows)
COMMIT;
What does this update do in your transaction?
As we said before, the Isolation principle should avoid that. But theory and practice are two different things.
Because ACID principles have a cost, and the more we want consistency, the more we will lose on another point.
Consistency vs. Performance
Ensuring a high level of consistency means a drop in performance. Indeed if you put a high level of consistency, you have to ensure that data are not accessed and changed when manipulating them. It means you reduce the possibility of concurrency.
When your fellows had configured this database, they put the default configuration for the concurrency into the table.
In Postgres, it is Read Committed
; this means that when you want to access data, if this data is modified in another transaction, it has to be committed first, and then you can access it.
Local Consistency
As a data analyst, you cannot change the database configuration. But you can ensure local consistency for your query by adding local isolation to your transaction.
Here we can use the Read Repeatable
isolation level. Then Bob's update will not interfere with your transaction, and you will have consistent data at the beginning and end of your transaction.
START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM coffees
...// some queries without updating anything
SELECT * FROM coffees
COMMIT;
BEGIN
coffee_id | name | unit_price | country
-----------+---------------------+------------+------------
1 | Black Honey | 14.4 | Costa Rica
3 | Las Fincas Granadas | 13.3 | Costa Rica
2 | Ambela Hakala | 15.0 | Ethiopia
(3 rows)
coffee_id | name | unit_price | country
-----------+---------------------+------------+------------
1 | Black Honey | 14.4 | Costa Rica
3 | Las Fincas Granadas | 13.3 | Costa Rica
2 | Ambela Hakala | 15.0 | Ethiopia
(3 rows)
COMMIT;
Now the data is the same through all your queries, and we can be confident about the data we are manipulating.
What a first day
The isolation principle in the ACID theorem is essential. But you discovered that in practice, it is different from in theory. Isolation is not automatically ensured for your transaction in a relational database.
Improving consistency means dropping performance; we can only have everything everywhere at a time. But sometimes, we do not have a choice since we often deal with access data or need a snapshot of a particular moment.
The isolation level set at the transaction level prevents you from changing database settings and disturbing everyone. While you can ensure that the data you are accessing are consistent through all your transaction if needed.
More is to come about concurrency in Postgres stay tune !
Keep in Touch
On Twitter : @yet_anotherdev
Top comments (0)