Order-preserving OR-Expansion in MongoDB and how $in:[,,] is treated as a range with skip scan, or an equality for the ESR rule

#mongodb #postgres #index

I concluded the previous post with a question. We have explored how queries with range and equality filters behave differently based on the order of fields in the index key. Now, let’s examine queries that use the $in[] operator.
This operator can be viewed as an equality filter that handles multiple values or as a range filter that skips scans across multiple ranges. In the previous post, we saw that MongoDB can use an index scan for both.

I use the same collection as in the previous post:

mdb> db.demoesr.drop();

mdb> db.demoesr.insertMany([
 { e:42, s:"a", r:10 },
 { e:42, s:"b", r:20 },
 { e:42, s:"b", r:10 },
 { e:42, s:"d", r:30 },
 { e:99,        r:11 }
]);

I will query one value of "e" for various values of "r" and sort by "s". Rather than creating an Equality, Sort, Range index (ESR), I believe the Range condition is much more selective and crucial than the sort, so I will create an index with the order of Equality, Range, Sort (ERS):

test> db.demoesr.createIndex(
      { e: 1 , r: 1, s : 1 }
);
e_1_r_1_s_1

This post aims to determine whether a $in[] functions similarly to an Equality filter with multiple values or resembles a Range filter that skips some ranges.
I run the following query where e:{ $eq: 42 } and r:{ $in:[...]} with a list of more than 200 values:

test> db.demoesr.find(
      { e:{ $eq: 42 }, r:{ $in:[
201, 200, 199, 198, 197, 196, 195, 194, 193, 192, 191, 190, 189, 188, 187, 186, 185, 184, 183, 182, 181, 180, 179, 178, 177, 176, 175, 174, 173, 172, 171, 170, 169, 168, 167, 166, 165, 164, 163, 162, 161, 160, 159, 158, 157, 156, 155, 154, 153, 152, 151, 150, 149, 148, 147, 146, 145, 144, 143, 142, 141, 140, 139, 138, 137, 136, 135, 134, 133, 132, 131, 130, 129, 128, 127, 126, 125, 124, 123, 122, 121, 120, 119, 118, 117, 116, 115, 114, 113, 112, 111, 110, 109, 108, 107, 106, 105, 104, 103, 102, 101, 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1
      ] } } , 
      { e:1,s:1,r:1,"_id":0 }
      ).sort({ s: 1 }).explain("executionStats").executionStats
;

{
  executionSuccess: true,
  nReturned: 4,
  totalKeysExamined: 5,
  totalDocsExamined: 0,
  executionStages: {
    stage: 'SORT',
    nReturned: 4,
    sortPattern: { s: 1 },
    inputStage: {
      stage: 'PROJECTION_COVERED',
      nReturned: 4,
      transformBy: { e: 1, s: 1, r: 1, _id: 0 },
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 4,
        keyPattern: { e: 1, r: 1, s: 1 },
        indexName: 'e_1_r_1_s_1',
        isMultiKey: false,
        multiKeyPaths: { e: [], r: [], s: [] },
        direction: 'forward',
        indexBounds: {
          e: [ '[42, 42]' ],
          r: [
            '[1, 1]',   '[2, 2]',   '[3, 3]',   '[4, 4]',   '[5, 5]',
            '[6, 6]',   '[7, 7]',   '[8, 8]',   '[9, 9]',   '[10, 10]',
            '[11, 11]', '[12, 12]', '[13, 13]', '[14, 14]', '[15, 15]',
            '[16, 16]', '[17, 17]', '[18, 18]', '[19, 19]', '[20, 20]',
            '[21, 21]', '[22, 22]', '[23, 23]', '[24, 24]', '[25, 25]',
            '[26, 26]', '[27, 27]', '[28, 28]', '[29, 29]', '[30, 30]',
            '[31, 31]', '[32, 32]', '[33, 33]', '[34, 34]', '[35, 35]',
            '[36, 36]', '[37, 37]', '[38, 38]', '[39, 39]', '[40, 40]',
            '[41, 41]', '[42, 42]', '[43, 43]', '[44, 44]', '[45, 45]',
            '[46, 46]', '[47, 47]', '[48, 48]', '[49, 49]', '[50, 50]',
            '[51, 51]', '[52, 52]', '[53, 53]', '[54, 54]', '[55, 55]',
            '[56, 56]', '[57, 57]', '[58, 58]', '[59, 59]', '[60, 60]',
            '[61, 61]', '[62, 62]', '[63, 63]', '[64, 64]', '[65, 65]',
            '[66, 66]', '[67, 67]', '[68, 68]', '[69, 69]', '[70, 70]',
            '[71, 71]', '[72, 72]', '[73, 73]', '[74, 74]', '[75, 75]',
            '[76, 76]', '[77, 77]', '[78, 78]', '[79, 79]', '[80, 80]',
            '[81, 81]', '[82, 82]', '[83, 83]', '[84, 84]', '[85, 85]',
            '[86, 86]', '[87, 87]', '[88, 88]', '[89, 89]', '[90, 90]',
            '[91, 91]', '[92, 92]', '[93, 93]', '[94, 94]', '[95, 95]',
            '[96, 96]', '[97, 97]', '[98, 98]', '[99, 99]', '[100, 100]',
            ... 101 more items
          ],
          s: [ '[MinKey, MaxKey]' ]
        },
        keysExamined: 5,
        seeks: 1
      }
    }
  }
}

This is the best way to filter the documents for retrieval, utilizing multiple ranges with a skip scan as discussed in the previous post. However, because it doesn't adhere to the ESR rule, the index failed to return the entries in the expected order, necessitating an additional SORT operation.

Sort Merge over a OR Expansion

I used more than 200 values to demonstrate the general behavior of the $in[] operator within the IXSCAN range. MongoDB optimizes performance when the list contains 200 values or fewer.

Here is the execution plan with a smaller list:

test> db.demoesr.find(
      { e:{ $eq: 42 }, r:{ $in:[10,30] } } , 
      { e:1,s:1,r:1,"_id":0 }
      ).sort({ s: 1 }).explain("executionStats").executionStats
;

{
  executionSuccess: true,
  nReturned: 3,
  totalKeysExamined: 3,
  totalDocsExamined: 0,
  executionStages: {
    stage: 'PROJECTION_DEFAULT',
    nReturned: 3,
    transformBy: { e: 1, s: 1, r: 1, _id: 0 },
    inputStage: {
      stage: 'SORT_MERGE',
      nReturned: 3,
      sortPattern: { s: 1 },
      inputStages: [
        {
          stage: 'IXSCAN',
          nReturned: 2,
          keyPattern: { e: 1, r: 1, s: 1 },
          indexName: 'e_1_r_1_s_1',
          isMultiKey: false,
          multiKeyPaths: { e: [], r: [], s: [] },
          direction: 'forward',
          indexBounds: {
            e: [ '[42, 42]' ],
            r: [ '[10, 10]' ],
            s: [ '[MinKey, MaxKey]' ]
          },
          keysExamined: 2,
          seeks: 1
        },
        {
          stage: 'IXSCAN',
          nReturned: 1,
          keyPattern: { e: 1, r: 1, s: 1 },
          indexName: 'e_1_r_1_s_1',
          isMultiKey: false,
          multiKeyPaths: { e: [], r: [], s: [] },
          indexBounds: {
            e: [ '[42, 42]' ],
            r: [ '[30, 30]' ],
            s: [ '[MinKey, MaxKey]' ]
          },
          keysExamined: 1,
          seeks: 1
        }
      ]
    }
  }
}

When processing 200 or fewer values, MongoDB treats them as one equality filter for each value, utilizing a separate IXSCAN. If the following field in the index key is used to sort the result, each index scan returns the keys in the correct sort order. MongoDB reads from these sorted segments and merges them while preserving the order, effectively performing a SORT_MERGE to bypass a sort operation.
Since each IXSCAN treats the $in[] value as an Equality instead of a Range, my index configuration consists of Equality, Equality, Sort rather than Equality, Range, Sort. It follows the Equality, Sort, Range (ESR) rule.

I've rarely seen such intelligent optimization in a database before. Oracle Database can convert a list into a concatenation (UNION ALL) using a technique known as OR Expansion, yet it lacks a merge sort feature on top of it. On the other hand, PostgreSQL can perform a merge sort on concatenated UNION but does not support the OR Expansion transformation to get it transparently. YugabyteDB employs a similar method for batching the nested loop join by pushing down the join condition as an array.
The internal name of this feature in MongoDB query planner is Explode for Sort, as a combination of Index Scan Explosion and Merge Sort and is advantageous with pagination when the limit is pushed down to each index scan.

This feature is also documented in the ESR rule additional considerations: https://www.mongodb.com/docs/manual/tutorial/equality-sort-range-rule/#additional-considerations

DEV Community

Order-preserving OR-Expansion in MongoDB and how $in:[,,] is treated as a range with skip scan, or an equality for the ESR rule

Sort Merge over a OR Expansion

Top comments (0)

Read next

Set up MongoDB primary and secondary with Docker.

🚀 Building a RESTful API in Go: A Practical Guide

Containerizing a MERN Stack Application!

Don't Make This Mistake: Choosing the Wrong Database Could Cost You!