This blog aims to assist you in understanding the final concepts of Chapter:8 [Buffer Manager] from the book The Internals of PostgreSQL.
Note: Ensure that you have a thorough understanding of
Chapter 8 Part-2 and basics of PostreSQL before we proceed to Chapter 8 Part-3, as it forms the foundation for our exploration.
So, Let's Continue:
3. Loading a Page from Storage to a Victim Buffer Pool Slot
1) Create the buffer_tag of the desired page and look up the buffer table. In this example, we assume that the buffer_tag is 'Tag_M' (the desired page is not found).
(2) Select a victim buffer pool slot using the clock-sweep algorithm, obtain the old entry, which contains the buffer_id of the victim pool slot, from the buffer table and pin the victim pool slot in the buffer descriptors layer. In this example, the buffer_id of the victim slot is 5 and the old entry is ‘Tag_F, id=5’. The clock sweep is described in the next subsection.
(3) Flush (write and fsync) the victim page data if it is dirty; otherwise proceed to step (4). The dirty page must be written to storage before overwriting with new data. Flushing a dirty page is performed as follows:
Acquire the shared content_lock and the exclusive io_in_progress lock of the descriptor with buffer_id 5 (released in step 6).
Change the states of the corresponding descriptor; the io_in_progress bit is set to '1' and the just_dirtied bit is set to '0'.
Depending on the situation, the XLogFlush() function is invoked to write WAL data on the WAL buffer to the current WAL segment file (details are omitted; WAL and the XLogFlush function are described in Chapter 9).
Flush the victim page data to storage.
Change the states of the corresponding descriptor; the io_in_progress bit is set to '0' and the valid bit is set to '1'.
Release the io_in_progress and content_lock locks.
(4) Acquire the old BufMappingLock partition that covers the slot that contains the old entry, in exclusive mode.
(5) Acquire the new BufMappingLock partition and insert the new entry to the buffer table:
Create the new entry comprised of the new buffer_tag 'Tag_M' and the victim's buffer_id.
Acquire the new BufMappingLock partition that covers the slot containing the new entry in exclusive mode.
Insert the new entry to the buffer table.
(6) Delete the old entry from the buffer table, and release the old BufMappingLock partition.
(7) Load the desired page data from the storage to the victim buffer slot. Then, update the flags of the descriptor with buffer_id 5; the dirty bit is set to '0 and initialize other bits.
(8) Release the new BufMappingLock partition.
(9) Access the buffer pool slot with buffer_id 5.
Loading a page from storage to a victim buffer pool slot in PostgreSQL is depicted in figure below:
Loading a page from storage to a victim buffer pool slot (continued) in PostgreSQL is depicted in figure below:
Page Replacement Algorithm: Clock Sweep
This algorithm is a variant of NFU (Not Frequently Used) with low overhead; it selects less frequently used pages efficiently.
The nextVictimBuffer, an unsigned 32-bit integer, is always pointing to one of the buffer descriptors and rotates clockwise.
The pseudocode of the algorithm is follows:
WHILE true
(1) Obtain the candidate buffer descriptor pointed by the nextVictimBuffer
(2) IF the candidate descriptor is unpinned THEN
(3) IF the candidate descriptor's usage_count == 0 THEN
BREAK WHILE LOOP /* the corresponding slot of this descriptor is victim slot. */
ELSE
Decrease the candidate descriptpor's usage_count by 1
END IF
END IF
(4) Advance nextVictimBuffer to the next one
END WHILE
(5) RETURN buffer_id of the victim
Clock Sweep in PostgreSQL is depicted in figure below:
1) The nextVictimBuffer points to the first descriptor (buffer_id 1); however, this descriptor is skipped because it is pinned.
2) The nextVictimBuffer points to the second descriptor (buffer_id 2). This descriptor is unpinned but its usage_count is 2; thus, the usage_count is decreased by 1 and the nextVictimBuffer advances to the third candidate.
3) The nextVictimBuffer points to the third descriptor (buffer_id 3). This descriptor is unpinned and its usage_count is 0; thus, this is the victim in this round.
Whenever the nextVictimBuffer sweeps an unpinned descriptor, its usage_count is decreased by 1. Therefore, if unpinned descripters exist in the buffer pool, this algorithm can always find a victim, whose usage_count is 0, by rotating the nextVictimBuffer.
Ring buffer
PostgreSQL uses a ring buffer when reading or writing a huge table.
The ring buffer is a small and temporary buffer area allocated to shared memory.
Conditions to allocate Ring Buffer
Ring buffer size for bulk-reading is 256 KB when scanning a relation larger than shared_buffers/4.
Ring buffer size for bulk-writing is 16 MB when executing certain SQL commands like COPY FROM, CREATE TABLE AS, etc.
Ring buffer size for vacuum-processing is 256 KB when an autovacuum performs vacuum processing.
The ring buffer is released immediately after use.
The ring buffer helps prevent cache thrashing and maintains a higher cache hit ratio during massive table operations.
Flushing of dirty pages
The checkpointer process writes a checkpoint record to the Write-Ahead Log (WAL) segment file and flushes dirty pages during checkpointing.
The checkpointer's primary role is to perform checkpointing and ensure data consistency.
The background writer process aims to minimize the impact of intensive checkpointing on database activity.
The background writer flushes dirty pages gradually and continuously, reducing the overall workload on the system.
By default, the background writer wakes up every 200 milliseconds (controlled by bgwriter_delay) and flushes a maximum of bgwriter_lru_maxpages (default is 100 pages).
The background writer's primary goal is to maintain a steady pace of flushing dirty pages, keeping the system running smoothly.
I hope, this blog has helped you in understanding the final concepts of Buffer Manager in PostreSQL.
Check out summary of Chapter : 9 Part-1
If you want to understand PostgreSQL In-Depth.
Top comments (0)