DEV Community

Cover image for Unique Index on NULL Values in SQL & NoSQL
Franck Pachot for AWS Heroes

Posted on • Edited on

Unique Index on NULL Values in SQL & NoSQL

You can create a unique index explicitly or implicitly with a unique constraint to ensure that a group of columns has no duplicates.
However, how do you handle columns that are NULL or have no value? Is the lack of a value a simple indicator that you don't want to duplicate, or should it be treated as unknown and potentially equivalent to any value, thus not violating the unique constraint?

I will consider the following three scenarios:

  • NoSQL absence of value in a document, like in MongoDB
  • SQL standard for NULL, as seen in PostgreSQL or YugabyteDB
  • Oracle Database implementation, which varies from the SQL standard

Let's take a telco example. I record calls from a caller to a callee at a specific time. I create a unique index to protect my database and prevent duplicates, as phone network devices may send a call record twice. In SQL, I would declare such table and index:

create table calls (
    id int generated always as identity primary key,
    time timestamp,
    callee varchar(11),
    caller varchar(11)
);

create unique index  calls_time_callee_caller_idx
 on calls (time, callee, caller)
;

Enter fullscreen mode Exit fullscreen mode

Although it may not be the best choice of datatype, it is runnable on all SQL databases.

I've declared no NOT NULL columns. Usually, the call time, caller, and callee are known, but there may be No Caller ID. I'll use this to expose the behavior with NULL values in a UNIQUE INDEX.

NULL's three-valued logic in SQL can be confusing (read @rtukpe's SQL NULLs are Weird!). To make it easier, I will start with a NoSQL database that prioritizes developer experience.

NoSQL behavior: MongoDB

A call record is a document sent from a telco network device. I use only the key attributes for this demo, but it usually includes many nested data structures. A document database may be a good fit for ingesting it.

In NoSQL, I don't need to create the collection beforehand, but I create the unique index to protect my database integrity:

db.calls.createIndex( 
 { "time": 1 , "callee": 1, "caller" : 1 }, { unique: true } 
) ;
Enter fullscreen mode Exit fullscreen mode

I insert a first record:

mongodb> db.calls.insertOne( 
 { time: new Date("2025-02-10T08:00:00Z"),
   callee: "+0000000000",
   caller: "+1111111111" 
} );

{
  acknowledged: true,
  insertedId: ObjectId('677d64a4e74b535fe6d4b0c2')
}
Enter fullscreen mode Exit fullscreen mode

I insert a second record from a different caller:

mongodb> db.calls.insertOne( 
 { time: new Date("2025-02-10T08:00:00Z"),
   callee: "+0000000000",
   caller: "+2222222222" 
} );

{
  acknowledged: true,
  insertedId: ObjectId('677d64a4e74b535fe6d4b0c2')
}
Enter fullscreen mode Exit fullscreen mode

I receive the first record again:

mongodb> db.calls.insertOne( 
 { time: new Date("2025-02-10T08:00:00Z"),
   callee: "+0000000000",
   caller: "+1111111111" 
} );

MongoServerError: E11000 duplicate key error collection: mongodb.calls index: time_1_callee_1_caller_1 dup key: { time: new Date(1739174400000), callee: "+0000000000", caller: "+1111111111" }
Enter fullscreen mode Exit fullscreen mode

MongoDB raises an error to guarantee the integrity of my database. That was my goal when creating a unique index.

I receive a record where the caller is unknown:

mongodb> db.calls.insertOne( 
 { time: new Date("2025-02-10T08:00:00Z"),
   callee: "+0000000000",
   caller: null 
} );

{
  acknowledged: true,
  insertedId: ObjectId('677d6a4be74b535fe6d4b0cb')
}
Enter fullscreen mode Exit fullscreen mode

I receive the same record, and once again, MongoDB prevents the insertion of a duplicate record:

mongodb> db.calls.insertOne( 
 { time: new Date("2025-02-10T08:00:00Z"),
   callee: "+0000000000",
   caller: null 
} );

MongoServerError: E11000 duplicate key error collection: mongodb.calls index: time_1_callee_1_caller_1 dup key: { time: new Date(1739174400000), callee: "+0000000000", caller: null }
Enter fullscreen mode Exit fullscreen mode

I get the same with a record that has no caller attribute, which is similar to caller: null because null in MongoDB is an absence of value:

mongodb> db.calls.insertOne( 
 { time: new Date("2025-02-10T08:00:00Z"),
   callee: "+0000000000"
} );

MongoServerError: E11000 duplicate key error collection: mongodb.calls index: time_1_callee_1_caller_1 dup key: { time: new Date(1739174400000), callee: "+0000000000", caller: null }
Enter fullscreen mode Exit fullscreen mode

Here are the three records that were inserted - no duplicates:

mongodb> db.calls.find();
[
  {
    _id: ObjectId('677d65a6e74b535fe6d4b0c5'),
    time: ISODate('2025-02-10T08:00:00.000Z'),
    callee: '+0000000000',
    caller: '+1111111111'
  },
  {
    _id: ObjectId('677d65ade74b535fe6d4b0c6'),
    time: ISODate('2025-02-10T08:00:00.000Z'),
    callee: '+0000000000',
    caller: '+2222222222'
  },
  {
    _id: ObjectId('677d6a4be74b535fe6d4b0cb'),
    time: ISODate('2025-02-10T08:00:00.000Z'),
    callee: '+0000000000',
    caller: null
  }
]
Enter fullscreen mode Exit fullscreen mode

Nulls represent the explicit absence of value in MongoDB. Two documents with the same lack of value in a key are considered to have the same key, raising a duplicate key in a unique index.

This contrasts with SQL, which has a fixed structure requiring all columns to be present in every record. In SQL, NULL signifies unknown values rather than an absence, and two unknown values are not the same until they are known to have the same value.

It is easy to get the same behavior in MongoDB. I re-create my index as a partial index that ignores null or unexisting caller so that the unique constraint concerns only documents where all key attributes are present:

mongodb> db.calls.dropIndex(
 { time: 1, callee: 1, caller: 1 }
);

{ nIndexesWas: 2, ok: 1 }

mongodb> db.calls.createIndex( 
  { time: 1, callee: 1, caller: 1 },
  { 
    unique: true, 
    partialFilterExpression: { 
      caller: { $type: "string" }  
    }
  }
);

time_1_callee_1_caller_1

Enter fullscreen mode Exit fullscreen mode

Such an index considers only entries with a caller of type string, ignoring the absence of a value or null, which is of type null.

I can insert documents with the absence of caller, or null caller:

mongodb> db.calls.insertMany( [
 { time: new Date("2025-02-10T08:00:00Z"),
   callee: "+0000000000",
   caller: null 
 } ,
 { time: new Date("2025-02-10T08:00:00Z"),
   callee: "+0000000000"
 } 
] );

{
  acknowledged: true,
  insertedIds: {
    '0': ObjectId('677d6f3ae74b535fe6d4b0d0'),
    '1': ObjectId('677d6f3ae74b535fe6d4b0d1')
  }
}

Enter fullscreen mode Exit fullscreen mode

Still, it detects duplicates when the caller is present:

mongodb> db.calls.insertOne( 
 { time: new Date("2025-02-10T08:00:00Z"),
   callee: "+0000000000",
   caller: "+1111111111" 
} );

MongoServerError: E11000 duplicate key error collection: mongodb.calls index: time_1_callee_1_caller_1 dup key: { time: new Date(1739174400000), callee: "+0000000000", caller: "+1111111111" }
Enter fullscreen mode Exit fullscreen mode

I have five calls, with some having no caller or null caller:

mongodb> db.calls.find();
[
  {
    _id: ObjectId('677d65a6e74b535fe6d4b0c5'),
    time: ISODate('2025-02-10T08:00:00.000Z'),
    callee: '+0000000000',
    caller: '+1111111111'
  },
  {
    _id: ObjectId('677d65ade74b535fe6d4b0c6'),
    time: ISODate('2025-02-10T08:00:00.000Z'),
    callee: '+0000000000',
    caller: '+2222222222'
  },
  {
    _id: ObjectId('677d6a4be74b535fe6d4b0cb'),
    time: ISODate('2025-02-10T08:00:00.000Z'),
    callee: '+0000000000',
    caller: null
  },
  {
    _id: ObjectId('677d6f3ae74b535fe6d4b0d0'),
    time: ISODate('2025-02-10T08:00:00.000Z'),
    callee: '+0000000000',
    caller: null
  },
  {
    _id: ObjectId('677d6f3ae74b535fe6d4b0d1'),
    time: ISODate('2025-02-10T08:00:00.000Z'),
    callee: '+0000000000'
  }
]
Enter fullscreen mode Exit fullscreen mode

This mimics the SQL behavior where the unique constraint is raised only when the value is known:

mongodb> db.calls.updateOne( 
 { caller: null },
 { $set: { caller: "+3333333333" } } 
);

{
  acknowledged: true,
  insertedId: null,
  matchedCount: 1,
  modifiedCount: 1,
  upsertedCount: 0
}

mongodb> db.calls.updateOne( 
 { caller: null }, 
 { $set: { caller: "+3333333333" } } 
);

MongoServerError: E11000 duplicate key error collection: mongodb.calls index: time_1_callee_1_caller_1 dup key: { time: new Date(1739174400000), callee: "+0000000000", caller: "+3333333333" }

Enter fullscreen mode Exit fullscreen mode

With MongoDB, two document keys with a null attribute or without this attribute are the same index key. You can create a partial index to ignore them.

SQL behavior: PostgreSQL (or YugabyteDB)

SQL is different: NULL is not the absence of a column value. If a column is absent, you must create a different table to store your row without it. A NULL in SQL is an unknown value, possibly because it is not known at the insert time and is expected to be updated later.
A unique constraint is raised only when duplicates exist among the known values, ignoring the null ones.

I've created a table with the description above in YugabyteDB 2.25, which is compatible with PostgreSQL 15:

yugabyte=> select version();
                                                                                            version                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.2-YB-2.25.0.0-b0 on aarch64-unknown-linux-gnu, compiled by clang version 17.0.6 (https://github.com/yugabyte/llvm-project.git 9b881774e40024e901fc6f3d313607b071c08631), 64-bit
(1 row)

yugabyte=# \d calls
                                    Table "public.calls"
 Column |            Type             | Collation | Nullable |           Default
--------+-----------------------------+-----------+----------+------------------------------
 id     | integer                     |           | not null | generated always as identity
 time   | timestamp without time zone |           |          |
 callee | character varying(11)       |           |          |
 caller | character varying(11)       |           |          |
Indexes:
    "calls_pkey" PRIMARY KEY, lsm (id ASC)
    "calls_time_callee_caller_idx" UNIQUE, lsm ("time" ASC, callee ASC, caller ASC)

Enter fullscreen mode Exit fullscreen mode

Duplicate keys are detected:

yugabyte=# insert into calls (time, callee, caller)
 values ('2025-02-10T08:00:00Z', '+0000000000', '+1111111111')
;
INSERT 0 1
yugabyte=# insert into calls (time, callee, caller)
 values ('2025-02-10T08:00:00Z', '+0000000000', '+2222222222')
;
INSERT 0 1
yugabyte=# insert into calls (time, callee, caller)
 values ('2025-02-10T08:00:00Z', '+0000000000', '+1111111111')
;
ERROR:  duplicate key value violates unique constraint "calls_time_callee_caller_idx"

Enter fullscreen mode Exit fullscreen mode

I'm able to insert multiple similar rows when one of the columns in key is null:

yugabyte=# insert into calls (time, callee, caller)
 values ('2025-02-10T08:00:00Z', '+0000000000', null)
;
INSERT 0 1
yugabyte=# insert into calls (time, callee, caller)
 values ('2025-02-10T08:00:00Z', '+0000000000', null)
;
INSERT 0 1
Enter fullscreen mode Exit fullscreen mode

You must think about it like this:

  • a SQL constraint raises an error only when its condition evaluates to false
  • equality between unknown values (nulls) is neither true nor false. It is unknown
  • the unknown condition result is null, not false, and then doesn't raise an error
  • it is different from a SELECT DISTINCT or SELECT UNIQUE that shows only what is known to be unique, ignoring the unknown ones

In PostgreSQL 15 (as well as YugabyteDB 2.25 and 2025.1, which will be released soon), you can change this behavior with the NULLS NOT DISTINCT clause of CREATE INDEX. It will detect existing duplicates and reject future duplicate insertions:

yugabyte=# create unique index calls_unique_index 
 on calls (time, callee, caller)
 NULLS NOT DISTINCT
;

ERROR:  could not create unique index "calls_unique_index"
Enter fullscreen mode Exit fullscreen mode

Higher versions of PostgreSQL will show which row failed:

DETAIL:  Key ("time", callee, caller)=(2025-02-10 08:00:00, +0000000000, null) is duplicated.
Enter fullscreen mode Exit fullscreen mode

I check them and delete them:

yugabyte=# select * from calls;
 id |        time         |   callee    |   caller
----+---------------------+-------------+-------------
  1 | 2025-02-10 08:00:00 | +0000000000 | +1111111111
  2 | 2025-02-10 08:00:00 | +0000000000 | +2222222222
  4 | 2025-02-10 08:00:00 | +0000000000 |
  5 | 2025-02-10 08:00:00 | +0000000000 |
(4 rows)

yugabyte=# delete from calls where id=5;
DELETE 1
Enter fullscreen mode Exit fullscreen mode

When YugabyteDB CREATE INDEX fails during backfilling and without transactional DDL, it may remain INVALID and have to be dropped:

yugabyte=> \d calls
                                    Table "public.calls"
 Column |            Type             | Collation | Nullable |           Default
--------+-----------------------------+-----------+----------+------------------------------
 id     | integer                     |           | not null | generated always as identity
 time   | timestamp without time zone |           |          |
 callee | character varying(11)       |           |          |
 caller | character varying(11)       |           |          |
Indexes:
    "calls_pkey" PRIMARY KEY, lsm (id ASC)
    "calls_time_callee_caller_idx" UNIQUE, lsm ("time" ASC, callee ASC, caller ASC)
    "calls_unique_index" UNIQUE, lsm ("time" ASC, callee ASC, caller ASC) NULLS NOT DISTINCT INVALID

yugabyte=> drop index calls_unique_index;
DROP INDEX
Enter fullscreen mode Exit fullscreen mode

As I removed the duplicates and the NULLS NOT DISTINCT index was successful:

yugabyte=# create unique index calls_unique_index 
 on calls (time, callee, caller)
 NULLS NOT DISTINCT
;
CREATE INDEX

                                    Table "public.calls"
 Column |            Type             | Collation | Nullable |           Default
--------+-----------------------------+-----------+----------+------------------------------
 id     | integer                     |           | not null | generated always as identity
 time   | timestamp without time zone |           |          |
 callee | character varying(11)       |           |          |
 caller | character varying(11)       |           |          |
Indexes:
    "calls_pkey" PRIMARY KEY, lsm (id ASC)
    "calls_time_callee_caller_idx" UNIQUE, lsm ("time" ASC, callee ASC, caller ASC)
    "calls_unique_index" UNIQUE, lsm ("time" ASC, callee ASC, caller ASC) NULLS NOT DISTINCT
Enter fullscreen mode Exit fullscreen mode

With such an index, two-valued logic is used, and similar keys are duplicated, even if they contain nulls:

yugabyte=# insert into calls (time, callee, caller)
 values ('2025-02-10T08:00:00Z', '+0000000000', null)
;
ERROR:  duplicate key value violates unique constraint "calls_unique_index"

Enter fullscreen mode Exit fullscreen mode

In SQL, all-null index entries follow the same rule. With NULLS NOT DISTINCT, only one can be inserted:

yugabyte=# insert into calls (time, callee, caller)
 values (null, null, null)
;
INSERT 0 1

yugabyte=# insert into calls (time, callee, caller)
 values (null, null, null)
;
ERROR:  duplicate key value violates unique constraint "calls_unique_index"
Enter fullscreen mode Exit fullscreen mode

PostgreSQL follows the SQL standard, and YugabyteDB follows PostgreSQL behavior. Markus Winand references the few databases that implement NOT NULL DISTINCT:
Image description
YugabyteDB doesn't need a dedicated line because it is PostgreSQL compatible: Support NULLS NOT DISTINCT on unique Index

Oracle Database behavior

I've created a table with the description above:

Oracle 23ai> info calls
TABLE: CALLS
         LAST ANALYZED:
         ROWS         :
         SAMPLE SIZE  :
         INMEMORY     :DISABLED
         COMMENTS     :

Columns
NAME         DATA TYPE           NULL  DEFAULT    COMMENTS
*ID          NUMBER(38,0)        No    "ADMIN"."ISEQ$$_149713".nextval
 TIME        TIMESTAMP(6)        Yes
 CALLEE      VARCHAR2(11 BYTE)   Yes
 CALLER      VARCHAR2(11 BYTE)   Yes

Indexes
INDEX_NAME                            UNIQUENESS    STATUS    FUNCIDX_STATUS    COLUMNS
_____________________________________ _____________ _________ _________________ _______________________
ADMIN.SYS_C0024911                    UNIQUE        VALID                       ID
ADMIN.CALLS_TIME_CALLEE_CALLER_IDX    UNIQUE        VALID                       TIME, CALLEE, CALLER

Enter fullscreen mode Exit fullscreen mode

Duplicate keys are detected:

Oracle 23ai> insert into calls (time, callee, caller)
 values (timestamp '2025-02-10 08:00:00', '+0000000000', '+1111111111')
;

1 row inserted.

Oracle 23ai> insert into calls (time, callee, caller)
 values (timestamp '2025-02-10 08:00:00', '+0000000000', '+2222222222')
;

1 row inserted.

Oracle 23ai> insert into calls (time, callee, caller)
 values (timestamp '2025-02-10 08:00:00', '+0000000000', '+1111111111')
;

Error starting at line : 1 in command -
insert into calls (time, callee, caller)
 values (timestamp '2025-02-10 08:00:00', '+0000000000', '+1111111111')
Error report -
ORA-00001: unique constraint (ADMIN.CALLS_TIME_CALLEE_CALLER_IDX) violated on table ADMIN.CALLS columns (TIME, CALLEE, CALLER)
ORA-03301: (ORA-00001 details) row with column values (TIME:10-FEB-25 08.00.00.000000 AM, CALLEE:'+0000000000', CALLER:'+1111111111') already exists

Enter fullscreen mode Exit fullscreen mode

I try to insert multiple similar rows when one of the columns in the key is null, which should be allowed by SQL standard:

Oracle 23ai> insert into calls (time, callee, caller)
 values (timestamp '2025-02-10 08:00:00', '+0000000000', null)
;

1 row inserted.

Oracle 23ai> insert into calls (time, callee, caller)
 values (timestamp '2025-02-10 08:00:00', '+0000000000', null)
;

Error starting at line : 1 in command -
insert into calls (time, callee, caller)
 values (timestamp '2025-02-10 08:00:00', '+0000000000', null)
Error report -
ORA-00001: unique constraint (ADMIN.CALLS_TIME_CALLEE_CALLER_IDX) violated on table ADMIN.CALLS columns (TIME, CALLEE, CALLER)
ORA-03301: (ORA-00001 details) row with column values (TIME:10-FEB-25 08.00.00.000000 AM, CALLEE:'+0000000000', CALLER:NULL) already exists

Enter fullscreen mode Exit fullscreen mode

The Oracle Database behaves like in my MongoDB example rather than the SQL one, or with NOT NULL DISTINCT, even if it doesn't support this clause. But there's a difference:

Oracle 23ai> insert into calls (time, callee, caller)
      values (null, null, null)
     ;

1 row inserted.

Oracle 23ai> insert into calls (time, callee, caller)
      values (null, null, null)
     ;

1 row inserted.

Oracle 23ai> insert into calls (time, callee, caller)
      values (null, null, null)
     ;

1 row inserted.

Oracle 23ai> select * from calls;

   ID TIME                               CALLEE         CALLER
_____ __________________________________ ______________ ______________
    7 10-FEB-25 08.00.00.000000000 AM    +0000000000    +1111111111
    8 10-FEB-25 08.00.00.000000000 AM    +0000000000    +2222222222
   10 10-FEB-25 08.00.00.000000000 AM    +0000000000
   12
   13
   14

6 rows selected.

Enter fullscreen mode Exit fullscreen mode

With Oracle, all NULLs are distinct, but some NULLs are more distinct than others 😉

  • two NULL are considered as NOT DISTINCT when other columns in the index key are equal
  • two NULL are considered as DISTINCT when all columns in the index key are NULL

The reason is the singular implementation of indexes in Oracle, where a zero-length value in the key represents a null, and null index entries are not indexed (all indexes on nullable values are actually partial indexes). Note that an empty string is a zero-length value, and behaves like a null (similar to another NoSQL database, DynamoDB). This differs from the SQL standard but must remain compatible with how it has always worked.

Oracle 23ai supports none of [NOT] NULL DISTINCT or partial indexing clauses. However, you can use the fact that all indexes are partial and apply NVL, COALESCE, or CASE to replace a NULL with a unique value, like the primary key:

Oracle 23ai> drop index calls_time_callee_caller_idx;

Oracle 23ai> create unique index  calls_time_callee_caller_idx
 on calls ( time, callee, nvl(caller,id) )
;
Enter fullscreen mode Exit fullscreen mode

This index doesn't consider nulls as violating the unique constraint because the constraint is created with an index on different values:

Oracle 23ai> insert into calls (time, callee, caller)
 values (timestamp '2025-02-10 08:00:00', '+0000000000', null)
;

1 row inserted.

Oracle 23ai> select * from calls;

   ID TIME                               CALLEE         CALLER
_____ __________________________________ ______________ ______________
    7 10-FEB-25 08.00.00.000000000 AM    +0000000000    +1111111111
    8 10-FEB-25 08.00.00.000000000 AM    +0000000000    +2222222222
   10 10-FEB-25 08.00.00.000000000 AM    +0000000000
   12
   13
   14
   15 10-FEB-25 08.00.00.000000000 AM    +0000000000

7 rows selected.
Enter fullscreen mode Exit fullscreen mode

Seeing this, you may wonder how Oracle does when using the Oracle Database API for MongoDB when inserting all null values, and it is compatible with how MongoDB works:

Connecting to:          mongodb://<credentials>@XXX.adb.us-ashburn-1.oraclecloudapps.com:27017/ora_mdb?authMechanism=PLAIN&authSource=%24external&ssl=true&retryWrites=false&loadBalanced=true&appName=mongosh+2.3.7
Using MongoDB:          4.2.14
Using Mongosh:          2.3.7

ora_mdb> db.calls.createIndex(
...  { "time": 1 , "callee": 1, "caller" : 1 }, { unique: true }
... ) ;
time_1_callee_1_caller_1
ora_mdb> db.calls.insertOne(  { } );
{
  acknowledged: true,
  insertedId: ObjectId('677d97ff5c88a98fedd4b0ca')
}
ora_mdb> db.calls.insertOne(  { } );
Uncaught:
MongoServerError[MONGO-11000]: ORA-00001: unique constraint (ORA.$ora:calls.time_1_callee_1_caller_1) violated on table ORA.calls columns (SYS_NC00005$, SYS_NC00006$, SYS_NC00007$)
ORA-03301: (ORA-00001 details) row with column values (SYS_NC00005$:'01', SYS_NC00006$:'01', SYS_NC00007$:'01') already exists
Enter fullscreen mode Exit fullscreen mode

The behavior looks correct, with the same documents not being considered distinct and raising the duplicate key error. From the message, it seems that the values in the Oracle index are ('01','01','01'), which explains why it doesn't behave like an all NULL entry. Of curiosity, here is the index that Oracle creates in the SQL schema for a MongoDB index:

ora_sql> set ddl storage off
DDL Option STORAGE was set to OFF

ora_sql> ddl ora."$ora:calls.time_1_callee_1_caller_1"

  CREATE UNIQUE MULTIVALUE INDEX "ORA"."$ora:calls.time_1_callee_1_caller_1" ON "ORA"."calls" (
JSON_MKMVI(JSON_TABLE( "DATA", '$' PRESENT ON EMPTY MINIMAL CROSS PRODUCT WITH ERROR ON PARALLEL ARRAYS COLUMNS( NESTED PATH '$."time"[*]' COLUMNS( "K0" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH ) , NESTED PATH '$."callee"[*]' COLUMNS( "K1" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH ) , NESTED PATH '$."caller"[*]' COLUMNS( "K2" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH ) ) )  AS "K0","K1","K2"), 
JSON_QUERY("DATA" FORMAT OSON , '$."callee"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE), 
JSON_QUERY("DATA" FORMAT OSON , '$."caller"[*]' RETURNING ANY ORA_RAWCOMPARE ASIS  WITHOUT ARRAY WRAPPER ERROR ON ERROR PRESENT ON EMPTY NULL ON MISMATCH TYPE(LAX)  MULTIVALUE))
  PCTFREE 10 INITRANS 20 MAXTRANS 68 COMPUTE STATISTICS
  TABLESPACE "DATA" ;

Enter fullscreen mode Exit fullscreen mode

Oracle implements a MongoDB-compatible API on top of the Oracle Database through many transformations in the SQL queries. Storing a document in an OSON column is easy, but indexing documents requires more complexity to emulate MongoDB behavior.


This long post explored an elementary example: a table with nullable columns used in a unique index. All databases behave differently except when they are genuinely compatible, such as PostgreSQL and YugabyteDB.
There are a few things to remember:

  • Null in MongoDB indicates the absence of a value. Two null or absent keys are considered not distinct and may raise a duplicate key error. Partial indexes can be used to behave differently. In a future blog post, we will see how to differentiate an explicit null from the absence of the attribute.
  • Null in SQL indicates that the value is unknown, and a three-value logic is applied when comparing to check for duplicates. Comparing with unknown is unknown and does not raise a duplicate error.
  • PostgreSQL and YugabyteDB are compatible with the SQL standard and allow you to choose the behavior of unique indexes with an SQL:2016 clause.
  • Oracle (and SQL Server) behave differently, but expression-based indexes can provide workarounds.
  • In SQL, most columns should be declared NOT NULL to avoid problems. Unfortunately, this is not the default and may require more tables and joins.
  • I didn't mention DynamoDB, where nulls indicate an empty value, like in Oracle Database, because it cannot be part of the primary key, and there are no unique secondary indexes.

No database is inherently better or worse than the others. The biggest mistake is neglecting to understand your database's behavior. When migrating from Oracle to a different database or vice versa, understanding the distinct handling of NULL values is crucial.

I love SQL and consider NULL to mean "exists but not yet known" (except for the behavior described earlier in Oracle and some other cases, like in an outer join result). I have never had trouble using NULL only for existing attributes with unknown values. However, SQL's differences from other programming languages can lead to frequent misuse of NULL, and a lousy developer experience when used in conditions. In contrast, MongoDB provides a more intuitive API for developers who are used to different languages, and its NULL handling adheres to the same principles, avoiding the confusing three-valued logic.

When I refer to the SQL standard, I'm referring to the definition of UNIQUE CONSTRAINT rather than UNIQUE INDEX. Indexes are implementation details that help find values, including duplicates. SQL defines uniqueness as:

A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns.

Read Troubles with Nulls, Views from the Users if you doubt that two-valued logic is more popular than the SQL behavior. Choose the DB that works best for you, but I recommend you learn how it handles nulls in all cases.

Some interesting links on the topic:

Top comments (0)