In PostgreSQL, data is stored in pages. A page is the smallest unit of data that can be read from or written to disk. Understanding how PostgreSQL organizes data at the storage level helps in optimizing database performance and understanding the internals of the database system.
What Are Pages and Blocks?
In PostgreSQL, the terms "page" and "block" are often used interchangeably. A page is a fixed-length block of data, typically 8KB (8192 bytes) in size, though this can be modified when compiling the server.
When PostgreSQL reads or writes data to disk, it does so in these page-sized chunks. This is the fundamental storage and I/O unit in PostgreSQL.
postgres=# SELECT current_setting('block_size');
current_setting
-----------------
8192
(1 row)
Page Structure
Each page in PostgreSQL follows a specific layout with several components:
Anatomy of a PostgreSQL page
- PageHeader (24 bytes) - Contains metadata about the page
- ItemIdData - Array of item identifiers pointing to the actual items
- Free space - Unallocated space where new data can be inserted
- Items - The actual data items (rows/tuples)
- Special space - Used by index access methods to store specialized data
Let's look at these components in more detail:
Don't be intimidated by these technical details! Understanding PostgreSQL's storage layout provides context for query execution and performance. This knowledge forms the foundation for deeper exploration of PostgreSQL's internal workings in future articles.
Page Header (PageHeaderData)
The page header occupies the first 24 bytes of each page and contains crucial information:
Field | Size | Description |
---|---|---|
pd_lsn |
8 bytes | LSN (Log Sequence Number) for WAL (Write-Ahead Logging) |
pd_checksum |
2 bytes | Page checksum for data integrity |
pd_flags |
2 bytes | Flag bits |
pd_lower |
2 bytes | Offset to the start of free space |
pd_upper |
2 bytes | Offset to the end of free space |
pd_special |
2 bytes | Offset to the start of special space |
pd_pagesize_version |
2 bytes | Page size and version information |
pd_prune_xid |
4 bytes | Oldest unpruned XMAX on the page |
create extension pageinspect;
-- Get the page header of the first page of the table: test
postgres=# SELECT * FROM page_header(get_raw_page(‘test’,0));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------|----------|-------|-------|-------|---------|----------|---------|----------
0/16FC9DB0 | 0 | 4 | 928 | 960 | 8192 | 8192 | 4 | 0
(1 row)
Item Identifiers (ItemIdData)
Following the page header are item identifiers, each requiring 4 bytes. These act as pointers to the actual items stored in the page. Each identifier contains:
- A byte-offset to the start of an item
- The length of the item in bytes
- Attribute bits affecting its interpretation
When a new row is inserted, a new item identifier is allocated from the beginning of the unallocated space, and pd_lower is increased accordingly.
Items (Rows/Tuples)
The actual data items (rows or tuples) are stored in the space allocated backward from the end of unallocated space. Each row has a structure called HeapTupleHeaderData
followed by the actual column values.
The row header contains information like:
- Transaction IDs (
t_xmin
,t_xmax
) - Command IDs (
t_cid
) - Item pointer (
t_ctid
) - Information masks and flags
- Offset to user data
CTID: The Physical Location Identifier
In PostgreSQL, each row has a unique physical identifier called the CTID (Current Tuple Identifier), which represents its location as a pair of values: (page_number, tuple_index).
Page Types
PostgreSQL supports several types of pages, each with a specific purpose:
- Table Pages: Store table rows
- Index Pages: Store index entries
- Bitmap Pages: Store bitmap index entries
- Heap Pages: Store heap tuples
- Special Pages: Store special data (e.g., TOAST data)
Takeaways
- Pages are the smallest unit of data that can be read from or written to disk in PostgreSQL.
- Each page has a header that contains metadata about the page.
- Pages are used to store data for tables, indexes, and other database objects.
- All data read and written to disk is done in pages.
References
- Stomatics: PostgreSQL Internals
- Sud Gajula: What is Page in Postgres?
- HeVodata: PostgreSQL Data Structure
- doxygen: ItemIdData
- PostgreSQL: Storage Layout
- PostgreSQL: Page Layout
Originally published at https://www.adiagr.com
Top comments (0)