PostgreSQL Synchronized Sequential Scans and LIMIT without an ORDER BY

#postgres #sql #database #fullscan

TL;DR: if you want predictable results with LIMIT or FETCH FIRST in any database, you need an ORDER BY.

Here is a quick blog post to explain the following observation where running the same query provides different results even when the table is not changing:

I've set the Repeatable Read isolation level and locked the table in exclusive mode to guarantee no concurrent modifications.

Here is how you can reproduce it.

I created a one-million-row table in PostgreSQL:

postgres=# create table demo as select generate_series(1,1000000) n;
SELECT 1000000

postgres=# vacuum analyze demo;
VACUUM

It's not best practice to create a table without a primary key, but I don't need it here as I'll only do sequential scans.

The trick is that I've run a sequential scan in the background, in a loop for five seconds:

\! s=$SECONDS ; while [ $(( $SECONDS - $s )) -lt 5 ] ; do psql -c 'select * from demo' > /dev/null ; done & sleep 1

I run my select * from demo limit 3; within those five seconds:

postgres=# begin transaction isolation level repeatable read;
BEGIN

postgres=*# lock table demo in exclusive mode;
LOCK TABLE

postgres=*# select * from demo limit 3;
   n
--------
 791905
 791906
 791907
(3 rows)

postgres=*# select * from demo limit 3;
   n
--------
 795521
 795522
 795523
(3 rows)

postgres=*# select * from demo limit 3;
   n
--------
 799137
 799138
 799139
(3 rows)

postgres=*# commit;
COMMIT

An SQL statement lacking an ORDER BY clause does not ensure any specific output order. Typically, a sequential scan begins at the start of the table, as it is a heap table without a defined logical order. However, if another sequential scan runs concurrently and is already filling the shared buffers, my sequential scan can jump on his back and fetch rows with it.
Once the other scan completes (at the end of the table), my sequential scan will continue from the beginning of the table until it reaches its starting point. This doesn't apply in my situation because the LIMIT clause only shows the first three rows, and the results I see are the first three rows currently read by the other session.

This feature dates back to PostgreSQL 8.3 and can be disabled with synchronize_seqscans:

postgres=*# select * from demo limit 3;
   n
--------
 817217
 817218
 817219
(3 rows)

postgres=*# set synchronize_seqscans to off;
SET
postgres=*# select * from demo limit 3;
 n
---
 1
 2
 3
(3 rows)

postgres=*# select * from demo limit 3;
 n
---
 1
 2
 3
(3 rows)

postgres=*# commit;
COMMIT

This feature allows PostgreSQL to let two queries use one sequential scan of the same table, minimizing I/O during concurrent sequential scans when the number of pages is more significant than the shared buffers. PostgreSQL sequential scan reads into the buffer pool, which all sessions share, and synchronizing on it is probably faster than an I/O call.
Do you know any other databases that are doing that? Oracle Database does the opposite: it tries to avoid synchronization and bypasses the shared buffer pool for large table full table scan.

I got answers to this question on linkedin (about Teradata and Microsoft SQL Server):

Franck Pachot on LinkedIn: PostgreSQL Synchronized Sequential Scans and LIMIT without an ORDER BY

A PostgreSQL sequential scan does not always begin at the start of the table. https://lnkd.in/ePtaRhYQ

linkedin.com

DEV Community

PostgreSQL Synchronized Sequential Scans and LIMIT without an ORDER BY

Franck Pachot on LinkedIn: PostgreSQL Synchronized Sequential Scans and LIMIT without an ORDER BY

Top comments (0)

Read next

How to connect to an Azure-hosted managed identities postgres server from a node app using the Prisma ORM

Backup Posts Dev.to to SQLite

Introducing an Open-Source MySQL Memory Calculator

Code analysis with SQL projects