DEV Community

Franck Pachot for YugabyteDB

Posted on

⛔ ORDER BY in Subquery

Here is a quick post to show the danger of not understanding how SQL and ORDER BY works.
In relational databases, SQL operates on unordered sets of rows. The sets or rows are tables, views, results of functions, or subqueries. You may think there's an order because the RDBMS may store the rows physically ordered and retrieve them with this apparent order. But it's just a side effect of the implementation, ready to break at runtime (with different physical ordering, different execution plans, different parallelization, or any other smart optimization).
The ORDER BY in SQL is different. It's not a relational operation on sets of rows but an operation on the result exposed outside the database when the application wants to fetch this set of rows in a specific order.

There are two consequences to keep in mind:

  • If you have no ORDER BY in the outer query, you can assume a random order, even if there's an ORDER BY in a subquery.
  • If you use OFFSET and LIMIT without an ORDER BY, you can assume that you pick a random subset from the result.

Here is an example in PostgreSQL:

create table t1 ( id uuid primary key default gen_random_uuid(), val1 int );
create table t2 ( id uuid primary key default gen_random_uuid(), val2 int );
insert into t1 ( val1 ) select generate_series(1,100);
insert into t2 select id val1 from t1;
Enter fullscreen mode Exit fullscreen mode

I run the following query to get OFFSET 10 LIMIT 10 from a query that has no ORDER BY in the outer query:

postgres=# select * from t1
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 6e21f38d-18ed-4834-bea1-b9a0d91ed288 |   52
 a4c00d5f-fd7c-42fc-97ee-cd309af47a09 |   77
 a5dd4329-137c-4aa2-b503-dc885378f155 |   59
 0ca0ef72-e39a-4eb7-9777-b1304016d968 |   15
 e6add227-d551-4429-bb67-a32ff4fbf76b |   83
 58a9285f-d68a-4173-a8b5-bdc7e679ed3b |   89
 c1464177-c3e8-4054-8616-a805350de739 |   84
 b5f492cc-0690-47da-9081-70d990857cae |   92
 e7638b03-8716-4321-ab00-b47c1b91b3f1 |   61
 c6cb972e-309c-4ee0-b1f5-7f73482672d9 |   88
(10 rows)
Enter fullscreen mode Exit fullscreen mode

The ordering of the rows is random, driven by how they were physically stored and read from the heap table.

With an ORDER BY, the rows are sorted before OFFSET and LIMIT, so the result becomes deterministic:

postgres=# select * from t1
 order by val1
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 cc4e4a88-c92f-4cfa-8d29-8b462be60619 |   11
 7212d0ab-b810-4002-928e-fa89607c059d |   12
 8f256559-fe98-4c21-a896-896edb436000 |   13
 3897cc06-813e-4d96-aaa2-83493cf3cea1 |   14
 0ca0ef72-e39a-4eb7-9777-b1304016d968 |   15
 9e4fe2f0-81f6-4086-b0a9-21b0407ac021 |   16
 76167b8a-f920-4c60-b72d-cbf37b7497c9 |   17
 52397fbc-c814-4e17-8c9a-bfda5a2e72e3 |   18
 85fee8c0-1080-4431-b650-ed7f5ce424aa |   19
 13ec825d-cb2d-4031-934a-6f203e978ccf |   20
(10 rows)
Enter fullscreen mode Exit fullscreen mode

If I keep the ORDER BY in a subquery and join it with another table, the pagination is random again:

postgres=# select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 6a66b081-33ed-4a59-9772-9b11f6581ea8 |   92
 88cf2bd1-c54c-40c1-a794-f2df63cd2730 |   61
 b776d312-4e63-412a-8f8c-a6c11efd958a |   45
 6154c15c-1d34-414d-9ba3-41e7bbed453f |   64
 3ba9f440-25d1-4162-86bc-8f0ad36abaca |   66
 0bfda8d3-66b1-41b8-884f-fa85d43e37c4 |   33
 d64885a2-6b1c-447d-9be7-d7fe9726919c |   58
 baf08b7c-ea2c-475c-a2cb-6cbe47524a27 |   40
 2e698c37-5557-4998-b9e6-b4f17f599039 |   36
 6219f25c-23b6-48f1-9e5f-3ce33ef7c975 |   78
(10 rows)
Enter fullscreen mode Exit fullscreen mode

PostgreSQL obeyed my ORDER BY, as an EXPLAIN shows a SORT operation, but the join method had another driving table, and the order was not preserved in the join result:

postgres=# explain (analyze, costs off, summary off)
select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Limit (actual time=0.056..0.061 rows=10 loops=1)
   ->  Hash Join (actual time=0.051..0.058 rows=20 loops=1)
         Hash Cond: (t2.id = t1.id)
         ->  Seq Scan on t2 (actual time=0.006..0.007 rows=20 loops=1)
         ->  Hash (actual time=0.040..0.040 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 14kB
               ->  Sort (actual time=0.024..0.029 rows=100 loops=1)
                     Sort Key: t1.val1
                     Sort Method: quicksort  Memory: 28kB
                     ->  Seq Scan on t1 (actual time=0.002..0.008 rows=100 loops=1)
(10 rows)
Enter fullscreen mode Exit fullscreen mode

Unfortunately, during your tests, you may be "lucky" (I'll rather call that "unlucky" because you failed to find a bug) and have an execution plan that preserves the order:

postgres=# set enable_hashjoin=off;
SET
postgres=# set enable_mergejoin=off;
SET

postgres=# select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 2fca8f19-7692-4eb1-918d-3bfaffd506a2 |   11
 1d78f1de-1c51-4215-9c0e-c52e1b43b735 |   12
 6187a2f1-2760-482c-9e88-21dadbf948ee |   13
 a2810410-63b9-4e12-8f36-e7e5fbeeb9e7 |   14
 65869c8f-4fae-474c-a724-37730be02267 |   15
 34719c89-131d-422f-bace-a169941daa87 |   16
 aab0465b-e9c2-4590-919d-eb4de46033dc |   17
 4f0c2070-7472-416f-a65b-acc6584f700d |   18
 73ad5b4d-b95e-48b2-88cb-942cecff27d3 |   19
 9476611c-b658-4653-9c3d-6dfb739ded36 |   20
(10 rows)

Enter fullscreen mode Exit fullscreen mode

The reason is that a Nested Loop join driven by the subquery was used, and the order was preserved:

postgres=# explain (analyze, costs off, summary off)
select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Limit (actual time=0.061..0.070 rows=10 loops=1)
   ->  Nested Loop (actual time=0.049..0.068 rows=20 loops=1)
         ->  Sort (actual time=0.035..0.036 rows=20 loops=1)
               Sort Key: t1.val1
               Sort Method: quicksort  Memory: 28kB
               ->  Seq Scan on t1 (actual time=0.009..0.015 rows=100 loops=1)
         ->  Index Only Scan using t2_pkey on t2 (actual time=0.001..0.001 rows=1 loops=20)
               Index Cond: (id = t1.id)
               Heap Fetches: 0
(9 rows)

Enter fullscreen mode Exit fullscreen mode

It is essential to understand that your query has a bug. You get the expected result as a side effect of a join algorithm that can change anytime during your application lifetime. It's the worst bug: it can pass all UAT and validations, but you will get the wrong results one day. There's only one way to find this bug: code review.

You can migrate to another database, like YugabyteDB, to benefit from resilience, elasticity, and better storage.

I execute the same on YugabyteDB, disabling hash and merge join to get a nested loop:

yugabyte=> select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 40e79207-acd4-4967-9f64-be604a071b43 |   30
 3cf7efff-e35b-4367-a25b-882d0d354034 |  100
 3caee5cf-f5a0-4bbc-bfc6-fe01e731bf61 |   89
 3432d77f-2f5b-40d8-8a02-36a637a9f8ce |   22
 e556b1c7-61eb-4404-8ff1-7c37b58804a2 |   87
 6badf12b-b9cc-4552-9328-14a01fa53d16 |   83
 86906c1f-f2d7-4a68-813a-1babcac794ac |   23
 0f3c0f0b-5847-4560-a658-6e7f87421b4f |   34
 1c3e91ce-0c71-44ae-98d7-f7bfade33aff |   42
 3c620fa7-2e0d-46d6-b564-03643583414d |   91
(10 rows)

Enter fullscreen mode Exit fullscreen mode

The result is not ordered. If you were unaware that you had a bug in your query, you might think that the new database is not PostgreSQL compatible: it was working before and is not working anymore.

No, it was not working before. An implementation detail just disguised your bug.

YugabyteDB optimizes the nested loop join by batching the results from the driving table and pushing them down as an array to the join condition:

yugabyte=> explain (analyze, costs off, summary off)
select * from
(
select * from t1
 order by val1 --> not a good idea in a subquery
) t1_ordered join t2 using(id)
 offset 10 limit 10
;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Limit (actual time=2.960..2.972 rows=10 loops=1)
   ->  YB Batched Nested Loop Join (actual time=2.144..2.967 rows=20 loops=1)
         Join Filter: (t1.id = t2.id)
         ->  Sort (actual time=1.532..1.541 rows=100 loops=1)
               Sort Key: t1.val1
               Sort Method: quicksort  Memory: 31kB
               ->  Seq Scan on t1 (actual time=0.714..1.492 rows=100 loops=1)
         ->  Index Scan using t2_pkey on t2 (actual time=0.630..0.635 rows=10 loops=2)
               Index Cond: (id = ANY (ARRAY[t1.id, $1, $2, ..., $1023]))
(9 rows)

Enter fullscreen mode Exit fullscreen mode

The subquery was sorted, but the batching didn't preserve this order. There is no error in the YugabyteDB execution plan because this is how SQL works: without an ORDER BY, there's no guarantee of a sorted result.

There's only one solution: fix your SQL statement by adding an ORDER BY to the outer query:

yugabyte=> select * from
(
select * from t1
 order by val1 
) t1_ordered join t2 using(id)
 order by val1
 offset 10 limit 10
;
                  id                  | val1
--------------------------------------+------
 4262b2eb-0451-4cef-83be-443b7c6d2897 |   11
 ab7af8dc-1dbe-4147-bc30-2881a3c1c6dc |   12
 4c1a81d5-63a8-4f7c-9f1e-fed123be6ef5 |   13
 5cdf0c28-1a34-48e4-98aa-8c4ddb4dc9df |   14
 bf1308b4-9d32-47b5-81ed-922eb6993127 |   15
 11ff2dae-474b-494f-be0a-b083852e36f1 |   16
 44ce8afe-e622-4571-9fd4-f0a9cc8ece2a |   17
 28418fcc-6c88-40a3-88bb-1ec15d1b753f |   18
 73db374f-d020-4a28-b0d1-52933e1cf1c9 |   19
 b5a8091a-4a1b-4774-a48d-31385ce5465f |   20
(10 rows)
Enter fullscreen mode Exit fullscreen mode

The result is correct. Of course, the ORDER BY in the subquery can be removed, but I left it there to show how YugabyteDB is smart. It didn't apply an additional sort operation and was still using a batched nested loop:

yugabyte=> explain (analyze, costs off, summary off)
select * from
(
select * from t1
 order by val1
) t1_ordered join t2 using(id)
 order by val1
 offset 10 limit 10
;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Limit (actual time=3.347..3.351 rows=10 loops=1)
   ->  YB Batched Nested Loop Join (actual time=2.080..3.348 rows=20 loops=1)
         Join Filter: (t1.id = t2.id)
         Sort Keys: t1.val1
         ->  Sort (actual time=1.484..1.493 rows=100 loops=1)
               Sort Key: t1.val1
               Sort Method: quicksort  Memory: 31kB
               ->  Seq Scan on t1 (actual time=0.698..1.445 rows=100 loops=1)
         ->  Index Scan using t2_pkey on t2 (actual time=0.795..0.813 rows=50 loops=2)
               Index Cond: (id = ANY (ARRAY[t1.id, $1, $2, ..., $1023]))
(10 rows)

Enter fullscreen mode Exit fullscreen mode

There is only one line different here: Sort Keys: t1.val1
This is not an additional blocking sort operation, only a guarantee that the order is preserved, as the SQL statement requires.

Without an order in the subquery, the result would be the same but with a sort operation after the join instead of before.


Relying on a sorted outcome from a query lacking an ORDER BY clause in the final projection (the outermost select) constitutes a bug in your application, regardless of whether the result appears correct.

Top comments (0)