DEV Community

Cover image for Navigating PostgreSQL - Pages and Blocks
Aditya Agrawal
Aditya Agrawal

Posted on • Originally published at adiagr.com

Navigating PostgreSQL - Pages and Blocks

In PostgreSQL, data is stored in pages. A page is the smallest unit of data that can be read from or written to disk. Understanding how PostgreSQL organizes data at the storage level helps in optimizing database performance and understanding the internals of the database system.

What Are Pages and Blocks?

In PostgreSQL, the terms "page" and "block" are often used interchangeably. A page is a fixed-length block of data, typically 8KB (8192 bytes) in size, though this can be modified when compiling the server.

When PostgreSQL reads or writes data to disk, it does so in these page-sized chunks. This is the fundamental storage and I/O unit in PostgreSQL.

postgres=# SELECT current_setting('block_size');
 current_setting
-----------------
 8192
(1 row)
Enter fullscreen mode Exit fullscreen mode

Page Structure

Each page in PostgreSQL follows a specific layout with several components:

Page Layout

Anatomy of a PostgreSQL page

  1. PageHeader (24 bytes) - Contains metadata about the page
  2. ItemIdData - Array of item identifiers pointing to the actual items
  3. Free space - Unallocated space where new data can be inserted
  4. Items - The actual data items (rows/tuples)
  5. Special space - Used by index access methods to store specialized data

Let's look at these components in more detail:


Don't be intimidated by these technical details! Understanding PostgreSQL's storage layout provides context for query execution and performance. This knowledge forms the foundation for deeper exploration of PostgreSQL's internal workings in future articles.


Page Header (PageHeaderData)

The page header occupies the first 24 bytes of each page and contains crucial information:

Page header

Field Size Description
pd_lsn 8 bytes LSN (Log Sequence Number) for WAL (Write-Ahead Logging)
pd_checksum 2 bytes Page checksum for data integrity
pd_flags 2 bytes Flag bits
pd_lower 2 bytes Offset to the start of free space
pd_upper 2 bytes Offset to the end of free space
pd_special 2 bytes Offset to the start of special space
pd_pagesize_version 2 bytes Page size and version information
pd_prune_xid 4 bytes Oldest unpruned XMAX on the page
create extension pageinspect;

-- Get the page header of the first page of the table: test
postgres=# SELECT * FROM page_header(get_raw_page(test,0));
   lsn     | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
-----------|----------|-------|-------|-------|---------|----------|---------|----------
0/16FC9DB0 |        0 |     4 |   928 |   960 |    8192 |     8192 |       4 |         0
(1 row)
Enter fullscreen mode Exit fullscreen mode

Item Identifiers (ItemIdData)

Following the page header are item identifiers, each requiring 4 bytes. These act as pointers to the actual items stored in the page. Each identifier contains:

  • A byte-offset to the start of an item
  • The length of the item in bytes
  • Attribute bits affecting its interpretation

When a new row is inserted, a new item identifier is allocated from the beginning of the unallocated space, and pd_lower is increased accordingly.

Items (Rows/Tuples)

The actual data items (rows or tuples) are stored in the space allocated backward from the end of unallocated space. Each row has a structure called HeapTupleHeaderData followed by the actual column values.

The row header contains information like:

  • Transaction IDs (t_xmin, t_xmax)
  • Command IDs (t_cid)
  • Item pointer (t_ctid)
  • Information masks and flags
  • Offset to user data

CTID: The Physical Location Identifier

In PostgreSQL, each row has a unique physical identifier called the CTID (Current Tuple Identifier), which represents its location as a pair of values: (page_number, tuple_index).

Page Types

PostgreSQL supports several types of pages, each with a specific purpose:

  • Table Pages: Store table rows
  • Index Pages: Store index entries
  • Bitmap Pages: Store bitmap index entries
  • Heap Pages: Store heap tuples
  • Special Pages: Store special data (e.g., TOAST data)

Takeaways

  • Pages are the smallest unit of data that can be read from or written to disk in PostgreSQL.
  • Each page has a header that contains metadata about the page.
  • Pages are used to store data for tables, indexes, and other database objects.
  • All data read and written to disk is done in pages.

References


Originally published at https://www.adiagr.com

Top comments (0)