This blog aims to assist you in understanding the initial concepts of Chapter:7 [Heap Only Tuple and Index-Only Scans] from the book The Internals of PostgreSQL.
Note: Ensure that you have a thorough understanding of
Chapter 6 and basics of PostreSQL before we proceed to Chapter 7 Part-1, as it forms the foundation for our exploration.
So, Let's Start:
Heap Only Tuple (HOT)
The HOT was implemented in version 8.3 to effectively use the pages of both index and table when the updated row is stored in the same table page that stores the old row.
The HOT also reduces the necessity of VACUUM processing.
Update a Row Without HOT
Assume that the table ‘tbl’ has two columns: ‘id’ and ‘data’; ‘id’ is the primary key of ‘tbl’.
The table ‘tbl’ has 1000 tuples; the last tuple of which the id is ‘1000’ is stored in the 5th page of the table. The last tuple is pointed from the corresponding index tuple, of which the key is ‘1000’ and whose tid is ‘(5,1)’. Refer to Figure(a) below.
Update a row without HOT in PostgreSQL is depicted in figure below:
In this case, PostgreSQL inserts not only the new table tuple but also the new index tuple in the index page. Refer to Figure(b) above.
The inserting of the index tuples consumes the index page space, and both the inserting and vacuuming costs of the index tuples are high.
How HOT Performs
- When a row is updated with HOT, if the updated row will be stored in the same table page that stores the old row, PostgreSQL does not insert the corresponding index tuple and sets the HEAP_HOT_UPDATED bit and the HEAP_ONLY_TUPLE bit to the t_informask2 fields of the old tuple and the new tuple, respectively.
Update a row with HOT in PostgreSQL is depicted in figure below:
HEAP_HOT_UPDATED and HEAP_ONLY_TUPLE bits in PostgreSQL is depicted in figure below:
For example, in this case, ‘Tuple_1’ and ‘Tuple_2’ are set to the HEAP_HOT_UPDATED bit and the HEAP_ONLY_TUPLE bit, respectively.
In addition, the HEAP_HOT_UPDATED and the HEAP_ONLY_TUPLE bits are used regardless of the pruning and the defragmentation processes, are executed.
A problem arises if the dead tuples in the table pages are removed. For example, in Figure(a) below , if ‘Tuple_1’ is removed since it is a dead tuple, ‘Tuple_2’ cannot be accessed from the index.
To resolve this problem, at an appropriate time, PostgreSQL redirects the line pointer that points to the old tuple to the line pointer that points to the new tuple. In PostgreSQL, this processing is called pruning. Figure(b) below despicts how PostgreSQL accesses the updated tuples after pruning.
The pruning processing will be executed, if possible, when a SQL command is executed such as SELECT, UPDATE, INSERT and DELETE. The exact execution timing is not described in this chapter because it is very complicated.
Pruning of the line pointers in PostgreSQL is depicted in figure below:
I hope, this blog has helped you in understanding the initial concepts of Heap Only Tuple and Index-Only Scans in PostreSQL.
Check out summary of Chapter : 7 Part-2
If you want to understand PostgreSQL In-Depth.
Top comments (0)