When creating the data model, we often use the NULL value to signify a missing value in certain fields for specific rows. However, be cautious with NULL values, as they can lead to data mismatches when performing the SQL queries. The following article will highlight common issues encountered by new developers when using SQL queries and provide strategies to resolve these problems.
What is the null value in SQL?
A NULL value is used to represent a data value that does not exist. In other words, It's a placeholder for the missing or the value that users do not know at the current time.
To illustrate, please consider the question: how many Apple stores are in the world? If the answer is 0
, it means that we have no Apple stores in the world. But if the answer is "NULL", it stands for we don't know about it, and maybe we need to check on the internet and respond to it later.
Damn.... we're getting into quite a bit of theory. Please get back to the main topic to find the magic of NULL value.
Why Null value is special in SQL?
Imagine that Graham was a newbie developer of a ABC startup company. In this company, he has the role of managing the HR system. Throughout this article, we will concentrate on the employees table likes:
CREATE TABLE employees (
employee_id serial primary key,
employee_name varchar(50),
age numeric,
base_salary numeric,
additional_salary numeric,
CONSTRAINT ck_age CHECK ( age > 0 ),
CONSTRAINT ck_base_salary CHECK ( base_salary >= 0 ),
CONSTRAINT ck_additional_salary CHECK ( additional_salary >= 0 )
);
INSERT INTO employees (employee_name, age, base_salary, additional_salary)
VALUES
('Graham', 24, 2000, 1000),
('Lynx', 25, 1300, null),
('Tian', 23, 1300, null),
('Kent', 32, 1040, null),
(null, 28, 3000, null),
(null, 50, 1000, null);
NULL value cannot be used with arithmetic operators.
On a beautiful day, his manager - Max, told him: "Please help me calculate the salary of our employees, you can calculate by sum base_salary
with additional_salary
".
"Yes sir, it's so easy, give me 5 seconds" - Graham said.
Since he was a junior backend developer who wasn't familiar with SQL queries, he wrote naive SQL like:
SELECT
employee_id,
employee_name,
age,
(base_salary + additional_salary) as monthly_salary
FROM employees;
"This is my result, it's so easy,": Graham said.
However, upon reviewing Graham's report, Max, who came from a technical background, angrily remarked, "You forgot to convert NULL values to zero, didn't you?".
Surely you already know what happened. This problem caused by NULL fundamentally changes how it interacts with arithmetic operations. In other words, we cannot use NULL with any arithmetic operation such as: +, -, *, /,...
To resolve this issue, we might use two approaches:
- Convert NULL to zero: in PostgreSQL, the COALESCE function is used to convert NULL to the specific default value. For example:
SELECT
employee_id,
employee_name,
age,
COALESCE(base_salary, 0) + COALESCE(additional_salary, 0) as monthly_salary
FROM employees;
-
Setting a default value: when designing the employee table, we can set default values for
base_salary
andadditional_salary
to0
. Please keep in mind that we can not use the default value for all circumstances. To illustrate, if you want to calculate BMI, we have to calculate by the formula:weight (kg) / [height (m)]2
. If you convert, for example, the weight to zero, the result will be:
SELECT COALESCE(NULL, 0) / POWER(1.8, 2)
The result will be zero (this result should be NULL to show the unknown BMI caused by missing information). It definitely causes some confusion for users looking at it.
📝Summary: be careful when using NULL with arithmetic operators since any arithmetic operation involving NULL will result in NULL. Consider using the default value with the nullable column when performing the calculation.
NULL value cannot be used in the comparison operation.
Ignoring Graham's previous mistake since he's just a new member, Max assigned him a new task: "Please find the employees named Graham, Lynx, Tian, and employees who did not input their names in our system".
Graham thinks: "Hmm, it's also so easy, I will make up for my previous mistake". He wrote the SQL query:
SELECT employee_id, employee_name, age
FROM public.employees
WHERE employee_name IN ('Graham', 'Lynx', 'Tian', NULL)
The result of this SQL was:
"Yeah, this is the information of Graham, Lynx, and Tian. Moreover, wonderful, all of our employees have updated their names, my boss".
After looking at this result, Tom has some suspicions about the veracity of the reports. Originally a technical person, he has double-checked in the database.
SELECT employee_id, employee_name, age FROM employees
Surprisingly, the result was:
Do you know what is happening in this example? the answer is "the NULL value cannot be equal or unequal to any value, so that cannot perform any comparison on NULL values" It means that we cannot use IN, NOT IN, =, <>, >, <, >=, <=,...
to compare with NULL.
To correct this report, we have to write a SQL query like:
SELECT employee_id, employee_name, age
FROM public.employees
WHERE employee_name in ('Graham', 'Lynx', 'Tian') OR employee_name IS null
Moreover, please be careful when using the NULL value in conjunction with NOT IN. For instance, Graham wishes to retrieve all employees except those who haven't provided their name or his name. If he employs NOT IN in this scenario:
SELECT employee_id, employee_name, age
FROM employees
WHERE employee_name NOT IN ('Graham', null)
Bummm, it seems like you have imagined the previous SQL results. We don't have any records in this query.
Why is it that when we use the IN
operator, we retrieve multiple records, but when we use NOT IN
, we end up with no results at all? Let's move to the next section to deep comprehend this issue.
📝Summary*: NULL only operates with IS
and IS NOT NULL
.
NULL value with conjunctive operator.
To clarify, In SQL, conjunctive operations are used to combine conditions in a query. The main conjunctive operators are:
1. **AND: This operator combines two or more conditions, and all conditions must be true for the overall condition to be true.
2. **OR: This operator combines two or more conditions, and at least one condition must be true for the overall condition to be true.
Ok, enough theory :), let's get started on this section.
As the previous section, NULL can not perform with any comparison operators, have you considered what occurs when employing conjunctive operators with NULL values?
Using NULL with AND
Let's back to Graham's workplace, he wants to find no-name employees and those more than 30 years old. If he performs:
SELECT employee_id, employee_name, age
FROM EMPLOYEES
WHERE employee_name = NULL AND age > 30
Hmmm, the result is that Graham cannot get any data:
It's so complicated, Graham tries to investigate what happens when the where condition USER_NAME = NULL AND AGE > 30
. He can use:
SELECT employee_id, employee_name, age, employee_name = NULL AND age > 30
FROM EMPLOYEES;
After executing this query, he gets the answer:
Weird, why do we have 2 records with null values? and the others are false? Graham noticed that when he used NULL AND FALSE
the result was FALSE
. In contrast, NULL AND TRUE
, the result was NULL. Since where statement only matches and gets records when the query is true, he cannot retrieve any records.
He tries to double-check again by the SQL:
SELECT NULL AND TRUE AS NULL_TRUE, NULL AND FALSE AS NULL_FALSE;
Wonderful, his statement was true.
Using NULL with OR
Let's slightly modify Graham's previous request: "finding no-name employees OR those more than 30 years old.". We continue sightly change the Graham's SQL:
SELECT employee_id, employee_name, age
FROM EMPLOYEES
WHERE employee_name = NULL OR age > 30
The results may surprise you a bit. We have 2 records in this case.
We also try to investigate a previous topic:
SELECT employee_id, employee_name, age, USER_NAME = NULL OR AGE > 30
FROM EMPLOYEES;
In this scenario, NULL OR TRUE
gave the TRUE
and NULL OR FALSE
retrieve the NULL
value.
Clarify some points
We moved on to 2 sections about NULL with the conjunctive operator. Please keep in mind that to clarify this section, Graham uses USER_NAME = NULL
(actually, he must use USER_NAME IS NULL
). In PostgreSQL, dealing with NULL values involves adhering to Three-valued logic (3VL). The following picture will clarify more detail about 3VL:
If you wish to delve deeper into this theory, please refer to the following resource for more information: Three-valued logic - Wikipedia
Conclusion
Managing NULL
values can pose significant challenges when using certain operators in SQL statements. Carelessness in handling NULL
values can lead to mismatches and affect user experience. Some individuals opt to avoid NULL
values by substituting them with empty strings or default values. However, this approach can create confusion when attempting to represent unknown or undefined data in a table.
References
- Three-Valued Logic Three-valued logic - Wikipedia
- Postgres Documentation: PostgreSQL
After Credit
In section 2, we had issues when using IN and NOT IN we didn't have any records, but when using NULL
with IN
, we found some records except NULL
.
Using IN with NULL
In PostgreSQL, performing the SQL with IN like:
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, NULL, ...);
When executing, PostgreSQL will convert this SQL to column_name = value1 OR column_name = value2 OR column_name = NULL OR ...
like
SELECT column_name(s)
FROM table_name
WHERE column_name = value1 OR column_name = value2 OR column_name = NULL OR ...
You might already be aware of this, right? due to Three-valued logic, TRUE OR NULL
evaluates to TRUE
. This explains why all records that meet the WHERE
condition are returned, except those where the value is NULL
.
Using NOT IN with NULL
Similar to the previous section, we execute the SQL statement:
SELECT column_name(s)
FROM table_name
WHERE column_name NOT IN (value1, value2, NULL, ...);
PostgreSQL will convert it to the SQL like:
SELECT column_name(s)
FROM table_name
WHERE column_name != value1 AND column_name != value2 AND column_name != NULL AND ..
When using NOT IN
in PostgreSQL, each row in table_name
is checked to determine if the value in column_name
does not exist within the specified list of values from a subquery or an explicit list. According to Three-valued logic (3VL), NULL != NULL
evaluates to NULL
. Additionally, operations such as NULL AND TRUE
result in NULL
, and NULL AND FALSE
result in FALSE
. Therefore, based on these logical rules, the statement within the query cannot be evaluated as true
. Consequently, regardless of the values included in the NOT IN
list, the query will yield zero results.
To deal with this problem we can use one of the following two ways:
-
Using
AND
condition
SELECT column_name(s)
FROM table_name
WHERE column_name NOT IN (value1, value2,...) AND column is not NULL
-
Using NOT EXISTS
If we use
NOT EXISTS
, we have to convert the value list to a temporary table or SQL select.
SELECT *
FROM EMPLOYEES e
WHERE NOT EXISTS (
SELECT
FROM
(VALUES ('Graham'), (NULL)) AS t (user_name)
WHERE t.employee_name = e.employee_name OR e.employee_name IS NULL
);
In practice, it's recommended to use EXISTS
and NOT EXISTS
for better performance, but we'll cover this topic in upcoming articles.
Thanks for reading this section! I hope that this article can be helpful when performing the SQL query. If you have any questions, or discussions or notice any mistakes, please put comments at the end of this section. See you in the next article :)
Author: Thanh Quach
Top comments (6)
Noice bro!
great <3
nice
Nice work. Thanks
cool, keep it up
Great