DEV Community

Nile Lazarus
Nile Lazarus

Posted on • Edited on

Demystifying the Internals of PostgreSQL - Chapter 6

Welcome back to the sixth instalment in our journey towards understanding The Internals of PostgreSQL.

In the last blog, we covered chapter 5 which delves into how PostgreSQL handles Concurrency Control.

In this blog, we will be exploring chapter 6 which covers Vacuum Processing. So without further ado, let's begin.

Introduction

Vacuum Processing is a maintenance process in PostgreSQL which we briefly touched upon at the end of the last blog. The main responsibilities that Vacuum Processing handles are removing dead tuples and freezing transaction IDs.

There are two types of methods used in vacuum processing of dead tuples:

  1. Concurrent VACUUM: dead tuples are simply removed and other transactions can still read the table during this process.
  2. Full VACUUM: dead tuples are removed and live tuples are defragmented as well. Other transactions cannot read the table while this process is underway.

Up until version 8.0 of PostgreSQL, vacuum processing had to be done manually and was only automated in 2005.
Additionally, this is a costly process which is why the Visibility Map (VM) feature was introduced in version 8.4 to increase efficiency.

Outline of Concurrent VACUUM

Vacuum Processing handles 3 tasks for all or some tables in the database:

  1. Removing dead tuples: removes dead tuples, defragments live tuples, and and removes indices of dead tuples.
  2. Freezing old txids: freeze old txids, update frozen txids, and remove unnecessary parts of the clog if possible
  3. Others: update FSM and VM of processed tables, and update statistics like pg_stat_all_tables.

For each table in the database, PostgreSQL implements processing techniques called first block, second block, and then third block. After this, statistics and system catalogs are updated, and unnecessary files and pages of the clog are removed if possible.
Let's now delve into what exactly the first block, second block, and third block do.

First Block

First block is responsible for freeze processing and removing index tuples which point to dead tuples.
PostgreSQL will first create a list of dead tuples, freeze old tuples, and store the list in maintenance_work_mem. After this, the list is used to remove the index tuples of the dead tuples.
Once this is done, PostgreSQL moves onto Second Block.

Second Block

Second block removes dead tuples and updates the FSM and VM.
Dead tuples are removed and the remaining live tuples are reordered (defragmentation) after which the FSM and VM of the page this was performed on are updated.

Third Block

Third block performs cleanup for the deleted indexes and updates the statistics and system catalogs.

Visibility Map

As mentioned before, vacuum processing is a costly process hence PostgreSQL introduced the Visibility Map in version 8.4 to improve efficiency.
The VM basically holds information on which pages have dead tuples which allows the vacuum process to skip pages that have no dead tuples.
The efficiency provided by VMs was further enhanced in version 9.6 such that VMs now also had page visibility and information on which pages contain frozen tuples.

Freeze Processing

Freeze processing has two modes:

  1. Lazy Mode: only pages which contain dead tuples are scanned using the VM of each respective table. Tuples whose t_xmin is less than the freezeLimit txid are frozen. The following formula is used to calculate freezeLimit txid:
    freezeLimit_txid = (OldestXmin − vacuum_freeze_min_age)

  2. Eager Mode: every page is scanned regardless of whether it does or does not contain dead tuples. System catalogs are also updated and unnecessary parts of the clog are removed if possible. The eager mode is performed when the following condition is satisfied:
    pg_database.datfrozenxid < (OldestXmin − vacuum_freeze_table_age)

Full VACUUM

Although Concurrent VACUUM seems thorough at a glance, it falls short in several areas such as reducing table sizes even after several dead tuples have been removed.
This negatively impacts both the efficiency of disk space usage and the overall performance of the database.
To tackle this issue, PostgreSQL provides the Full VACUUM mode which takes the following steps:

  1. Creates new table file
  2. Copies live tuples to the new table file
  3. Deletes the old file, rebuilds indexes and updates statistics, FSM, and VM

Top comments (0)