Introduction
Durability and Consistency are essential for relational databases. They are part of the ACID principles.
To ensure this durability and consistency, relational databases use several techniques. One of these is to use a WAL file (Write-Ahead Log file).
Let's see what this means and how it affects the performance of our read and writes operations. And determines what can suit more for an ETL process.
Tables
Logged Tables
When you create a table in a relational database, you make a logged table. Every manipulation (DML statement) on your data will be logged to the WAL file before being run on your database in a LOGGED
table.
CREATE TABLE logged_table(test integer);
For example, if you do an insert and then delete in a transaction as follows:
BEGIN;
INSERT INTO logged_table VALUES (1);
DELETE FROM logged_table;
COMMIT;
This will write the WAL file, the insertion you want to perform, and the deletion you wish to achieve.
Then it will perform the insertion and, eventually, the deletion.
This ensures :
- Data stay consistent
- Recovering data in case of systems failures
Unlogged Tables
There is another type of table in a relational database unlogged table.
CREATE UNLOGGED TABLE unlogged_table(test integer);
Performance
As you can tell, writing every operation in a log file before executing has an overhead cost.
With the simple tables, we had created before :
select now();
insert into logged_table values(generate_series(1,20000000));
select now();
clock_timestamp
-------------------------------
2023-06-19 09:07:55.525075+04
(1 row)
INSERT 0 20000000
clock_timestamp
-------------------------------
2023-06-19 09:08:20.054688+04
(1 row)
-------------------------------------------------------------
select now();
insert into unlogged_table values(generate_series(1,20000000));
select now();
insert into test2 values(generate_series(1,20000000));
select clock_timestamp();
clock_timestamp
-------------------------------
2023-06-19 09:07:37.594111+04
(1 row)
INSERT 0 20000000
clock_timestamp
------------------------------
2023-06-19 09:07:41.37622+04
(1 row)
Unlogged tables enable us to manipulate data faster. Here for a significant amount of data, we have a more than 50% difference in speed.
ETL process needs
Does an ETL process need for WAL? It might not be. Indeed ETL processes are processes that are often rerun when data changes. It is easy, and we must ensure data are correctly formatted and stored.
Speed is more valuable than ACID principles here since we can rerun our pipeline without fearing losing our data.
In this case, the unlogged table can be a valuable asset.
Conclusion
Our favorite RDBMS has many features, and the more you dig into it, the more you realize that there are things to discover.
Using any feature for the sake of it is something other than what we want. But in this case, knowing the difference between a Logged table and an Unlogged table can result in winning times and speed of execution in our ETL pipeline.
See you around for the following article. I hope you enjoy it!
Keep in Touch
On Twitter : @yet_anotherDev
On Linkedin : Lucas Barret
Top comments (0)