Franck Pachot for YugabyteDB

Posted on Jan 30

⛔ ORDER BY in Subquery

#yugabytedb #distributed #postgres #sql

Here is a quick post to show the danger of not understanding how SQL and ORDER BY works.
In relational databases, SQL operates on unordered sets of rows. The sets or rows are tables, views, results of functions, or subqueries. You may think there's an order because the RDBMS may store the rows physically ordered and retrieve them with this apparent order. But it's just a side effect of the implementation, ready to break at runtime (with different physical ordering, different execution plans, different parallelization, or any other smart optimization).
The ORDER BY in SQL is different. It's not a relational operation on sets of rows but an operation on the result exposed outside the database when the application wants to fetch this set of rows in a specific order.

There are two consequences to keep in mind:

If you have no ORDER BY in the outer query, you can assume a random order, even if there's an ORDER BY in a subquery.
If you use OFFSET and LIMIT without an ORDER BY, you can assume that you pick a random subset from the result.

Here is an example in PostgreSQL:

create table t1 ( id uuid primary key default gen_random_uuid(), val1 int );
create table t2 ( id uuid primary key default gen_random_uuid(), val2 int );
insert into t1 ( val1 ) select generate_series(1,100);
insert into t2 select id val1 from t1;

I run the following query to get OFFSET 10 LIMIT 10 from a query that has no ORDER BY in the outer query:

postgres=# select * from t1
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 6e21f38d-18ed-4834-bea1-b9a0d91ed288 |   52
 a4c00d5f-fd7c-42fc-97ee-cd309af47a09 |   77
 a5dd4329-137c-4aa2-b503-dc885378f155 |   59
 0ca0ef72-e39a-4eb7-9777-b1304016d968 |   15
 e6add227-d551-4429-bb67-a32ff4fbf76b |   83
 58a9285f-d68a-4173-a8b5-bdc7e679ed3b |   89
 c1464177-c3e8-4054-8616-a805350de739 |   84
 b5f492cc-0690-47da-9081-70d990857cae |   92
 e7638b03-8716-4321-ab00-b47c1b91b3f1 |   61
 c6cb972e-309c-4ee0-b1f5-7f73482672d9 |   88
(10 rows)

The ordering of the rows is random, driven by how they were physically stored and read from the heap table.

With an ORDER BY, the rows are sorted before OFFSET and LIMIT, so the result becomes deterministic:

postgres=# select * from t1
 order by val1
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 cc4e4a88-c92f-4cfa-8d29-8b462be60619 |   11
 7212d0ab-b810-4002-928e-fa89607c059d |   12
 8f256559-fe98-4c21-a896-896edb436000 |   13
 3897cc06-813e-4d96-aaa2-83493cf3cea1 |   14
 0ca0ef72-e39a-4eb7-9777-b1304016d968 |   15
 9e4fe2f0-81f6-4086-b0a9-21b0407ac021 |   16
 76167b8a-f920-4c60-b72d-cbf37b7497c9 |   17
 52397fbc-c814-4e17-8c9a-bfda5a2e72e3 |   18
 85fee8c0-1080-4431-b650-ed7f5ce424aa |   19
 13ec825d-cb2d-4031-934a-6f203e978ccf |   20
(10 rows)

If I keep the ORDER BY in a subquery and join it with another table, the pagination is random again:

postgres=# select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 6a66b081-33ed-4a59-9772-9b11f6581ea8 |   92
 88cf2bd1-c54c-40c1-a794-f2df63cd2730 |   61
 b776d312-4e63-412a-8f8c-a6c11efd958a |   45
 6154c15c-1d34-414d-9ba3-41e7bbed453f |   64
 3ba9f440-25d1-4162-86bc-8f0ad36abaca |   66
 0bfda8d3-66b1-41b8-884f-fa85d43e37c4 |   33
 d64885a2-6b1c-447d-9be7-d7fe9726919c |   58
 baf08b7c-ea2c-475c-a2cb-6cbe47524a27 |   40
 2e698c37-5557-4998-b9e6-b4f17f599039 |   36
 6219f25c-23b6-48f1-9e5f-3ce33ef7c975 |   78
(10 rows)

PostgreSQL obeyed my ORDER BY, as an EXPLAIN shows a SORT operation, but the join method had another driving table, and the order was not preserved in the join result:

postgres=# explain (analyze, costs off, summary off)
select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Limit (actual time=0.056..0.061 rows=10 loops=1)
   ->  Hash Join (actual time=0.051..0.058 rows=20 loops=1)
         Hash Cond: (t2.id = t1.id)
         ->  Seq Scan on t2 (actual time=0.006..0.007 rows=20 loops=1)
         ->  Hash (actual time=0.040..0.040 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 14kB
               ->  Sort (actual time=0.024..0.029 rows=100 loops=1)
                     Sort Key: t1.val1
                     Sort Method: quicksort  Memory: 28kB
                     ->  Seq Scan on t1 (actual time=0.002..0.008 rows=100 loops=1)
(10 rows)

Unfortunately, during your tests, you may be "lucky" (I'll rather call that "unlucky" because you failed to find a bug) and have an execution plan that preserves the order:

postgres=# set enable_hashjoin=off;
SET
postgres=# set enable_mergejoin=off;
SET

postgres=# select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 2fca8f19-7692-4eb1-918d-3bfaffd506a2 |   11
 1d78f1de-1c51-4215-9c0e-c52e1b43b735 |   12
 6187a2f1-2760-482c-9e88-21dadbf948ee |   13
 a2810410-63b9-4e12-8f36-e7e5fbeeb9e7 |   14
 65869c8f-4fae-474c-a724-37730be02267 |   15
 34719c89-131d-422f-bace-a169941daa87 |   16
 aab0465b-e9c2-4590-919d-eb4de46033dc |   17
 4f0c2070-7472-416f-a65b-acc6584f700d |   18
 73ad5b4d-b95e-48b2-88cb-942cecff27d3 |   19
 9476611c-b658-4653-9c3d-6dfb739ded36 |   20
(10 rows)

The reason is that a Nested Loop join driven by the subquery was used, and the order was preserved:

postgres=# explain (analyze, costs off, summary off)
select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Limit (actual time=0.061..0.070 rows=10 loops=1)
   ->  Nested Loop (actual time=0.049..0.068 rows=20 loops=1)
         ->  Sort (actual time=0.035..0.036 rows=20 loops=1)
               Sort Key: t1.val1
               Sort Method: quicksort  Memory: 28kB
               ->  Seq Scan on t1 (actual time=0.009..0.015 rows=100 loops=1)
         ->  Index Only Scan using t2_pkey on t2 (actual time=0.001..0.001 rows=1 loops=20)
               Index Cond: (id = t1.id)
               Heap Fetches: 0
(9 rows)

It is essential to understand that your query has a bug. You get the expected result as a side effect of a join algorithm that can change anytime during your application lifetime. It's the worst bug: it can pass all UAT and validations, but you will get the wrong results one day. There's only one way to find this bug: code review.

You can migrate to another database, like YugabyteDB, to benefit from resilience, elasticity, and better storage.

I execute the same on YugabyteDB, disabling hash and merge join to get a nested loop:

yugabyte=> select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 40e79207-acd4-4967-9f64-be604a071b43 |   30
 3cf7efff-e35b-4367-a25b-882d0d354034 |  100
 3caee5cf-f5a0-4bbc-bfc6-fe01e731bf61 |   89
 3432d77f-2f5b-40d8-8a02-36a637a9f8ce |   22
 e556b1c7-61eb-4404-8ff1-7c37b58804a2 |   87
 6badf12b-b9cc-4552-9328-14a01fa53d16 |   83
 86906c1f-f2d7-4a68-813a-1babcac794ac |   23
 0f3c0f0b-5847-4560-a658-6e7f87421b4f |   34
 1c3e91ce-0c71-44ae-98d7-f7bfade33aff |   42
 3c620fa7-2e0d-46d6-b564-03643583414d |   91
(10 rows)

The result is not ordered. If you were unaware that you had a bug in your query, you might think that the new database is not PostgreSQL compatible: it was working before and is not working anymore.

No, it was not working before. An implementation detail just disguised your bug.

YugabyteDB optimizes the nested loop join by batching the results from the driving table and pushing them down as an array to the join condition:

yugabyte=> explain (analyze, costs off, summary off)
select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Limit (actual time=2.960..2.972 rows=10 loops=1)
   ->  YB Batched Nested Loop Join (actual time=2.144..2.967 rows=20 loops=1)
         Join Filter: (t1.id = t2.id)
         ->  Sort (actual time=1.532..1.541 rows=100 loops=1)
               Sort Key: t1.val1
               Sort Method: quicksort  Memory: 31kB
               ->  Seq Scan on t1 (actual time=0.714..1.492 rows=100 loops=1)
         ->  Index Scan using t2_pkey on t2 (actual time=0.630..0.635 rows=10 loops=2)
               Index Cond: (id = ANY (ARRAY[t1.id, $1, $2, ..., $1023]))
(9 rows)

The subquery was sorted, but the batching didn't preserve this order. There is no error in the YugabyteDB execution plan because this is how SQL works: without an ORDER BY, there's no guarantee of a sorted result.

There's only one solution: fix your SQL statement by adding an ORDER BY to the outer query:

yugabyte=> select * from
(
select * from t1
 order by val1 
) t1_ordered join t2 using(id)
 order by val1
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 4262b2eb-0451-4cef-83be-443b7c6d2897 |   11
 ab7af8dc-1dbe-4147-bc30-2881a3c1c6dc |   12
 4c1a81d5-63a8-4f7c-9f1e-fed123be6ef5 |   13
 5cdf0c28-1a34-48e4-98aa-8c4ddb4dc9df |   14
 bf1308b4-9d32-47b5-81ed-922eb6993127 |   15
 11ff2dae-474b-494f-be0a-b083852e36f1 |   16
 44ce8afe-e622-4571-9fd4-f0a9cc8ece2a |   17
 28418fcc-6c88-40a3-88bb-1ec15d1b753f |   18
 73db374f-d020-4a28-b0d1-52933e1cf1c9 |   19
 b5a8091a-4a1b-4774-a48d-31385ce5465f |   20
(10 rows)

The result is correct. Of course, the ORDER BY in the subquery can be removed, but I left it there to show how YugabyteDB is smart. It didn't apply an additional sort operation and was still using a batched nested loop:

yugabyte=> explain (analyze, costs off, summary off)
select * from
(
select * from t1
 order by val1
) t1_ordered join t2 using(id)
 order by val1
 offset 10 limit 10
;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Limit (actual time=3.347..3.351 rows=10 loops=1)
   ->  YB Batched Nested Loop Join (actual time=2.080..3.348 rows=20 loops=1)
         Join Filter: (t1.id = t2.id)
         Sort Keys: t1.val1
         ->  Sort (actual time=1.484..1.493 rows=100 loops=1)
               Sort Key: t1.val1
               Sort Method: quicksort  Memory: 31kB
               ->  Seq Scan on t1 (actual time=0.698..1.445 rows=100 loops=1)
         ->  Index Scan using t2_pkey on t2 (actual time=0.795..0.813 rows=50 loops=2)
               Index Cond: (id = ANY (ARRAY[t1.id, $1, $2, ..., $1023]))
(10 rows)

There is only one line different here: Sort Keys: t1.val1
This is not an additional blocking sort operation, only a guarantee that the order is preserved, as the SQL statement requires.

Without an order in the subquery, the result would be the same but with a sort operation after the join instead of before.

Relying on a sorted outcome from a query lacking an ORDER BY clause in the final projection (the outermost select) constitutes a bug in your application, regardless of whether the result appears correct.

Top comments (2)

Robert Frunzke • Feb 1 • Edited

So, bottom line is:

do never rely on order clauses in sub-queries,
usually omit them at all,
and instead add an order clause to outer query,
… except, if you want to direct your dbms into a specific execution plan - but this is neither reliable nor portable.

Right?

Franck Pachot • Feb 12

Yes, with some exceptions, of course. After checking that the RDBMS behaves like I think it does, I've used ORDER BY LIMIT in lateral join subqueries. So that fits in your point 4.

DEV Community

⛔ ORDER BY in Subquery

Top comments (2)

Read next

Front-End to Full-Stack Journey Part 2: SQL & SQLite

Migrating from Azure Database for PostgreSQL to Neon

⚡ 🚀 Quickly Start Dev Environment For MySQL, PostgreSQL, MongoDB, Redis, and Kafka Using Docker Compose 🔥

Building an Advanced CRUD API with Sequelize, PostgreSQL, and JWT Authentication in Node.js