DEV Community

Cover image for Aggregates with NULL: count(), min(), max(), sum()
Franck Pachot for YugabyteDB

Posted on

Aggregates with NULL: count(), min(), max(), sum()

In the previous blog, I explained that a NULL in a column indicates an unknown value rather than a non-existent one. In a table row, values must exist, with a value or a null, when the value is unknown. In a normalized data model, missing values are represented by the absence of rows, not by null. However, outer joins can create rows for non-existing values. All their columns, including the primary key, contain NULLs.

Arithmetic with NULL is simple when you consider it as unknown.

Many operations have an unknown result when one operand is unknown:

yugabyte=# with xyz(x,y,z) as (
 values ( 
     cast(0 as int), cast(300 as int), cast(null as int)
  )
) select x,y,z
 , x+y+z as "x+y+z"
 , x-y-z as "x-y-z"
 , x*y*z as "x*y*z"
from xyz
;
 x |  y  | z | x+y+z | x-y-z | x*y*z
---+-----+---+-------+-------+-------
 0 | 300 |   |       |       |
(1 row)
Enter fullscreen mode Exit fullscreen mode

It makes sense. How can you know the sum without knowing all values?

Unfortunately, databases may have different implementations where a function is explicitly defined as ignoring null operands. For example, that's what PostgreSQL does for for greatest() and least() functions:

with xyz(x,y,z) as (
 values ( cast(0 as int), cast(300 as int), cast(null as int))
) select xyz.*
 , greatest(x,y,z) "greatest(x,y,z)"
 ,    least(x,y,z)    "least(x,y,z)"
from xyz
;

 x |  y  | z | greatest(x,y,z) | least(x,y,z)
---+-----+---+-----------------+--------------
 0 | 300 |   |             300 |            0
(1 row)
Enter fullscreen mode Exit fullscreen mode

According to the SQL standard, these functions should return null when one argument is null. PostgreSQL does differently, and YugabyteDB is compatible with PostgreSQL.

The documentation explains the behavior, and Markus Winand compares all databases: https://modern-sql.com/caniuse/greatest-least#null
Image description

In the SQL standard, aggregate functions behave differently. They are defined as ignoring null values.

I'm using the table created in the previous post, where I had three employees with a commission: one unknown, one known to be zero, and another known to be 300.

yugabyte=# select * from commissions;
 empno | comm
-------+------
  7839 |
  7499 |  300
  7844 |    0
(3 rows)

yugabyte=# select 
 count(*), count(comm) , min(comm) , max(comm) , sum(comm)
 from commissions
;

 count | min | max | sum
-------+-----+-----+-----
     2 |   0 | 300 | 300
(1 row)
Enter fullscreen mode Exit fullscreen mode

Unknown values are ignored, and two employees have commissions with known values: one at zero and another at 300. This distinction between zero and null is crucial and shows up here.

Furthermore, it's vital to distinguish between a non-existent commission and one that is existing but unknown or zero. Although I have only two known commissions, I know I have three in total, including an unknown one. This highlights the difference between counting column values and counting rows (which incorporate existing commissions that may be unknown):

yugabyte=# select count(*) as "count(*)", count(comm) as "count(comm)"
 from commissions
;

 count(*) | count(comm)
----------+-------------
        3 |           2
(1 row)
Enter fullscreen mode Exit fullscreen mode

count(*) counts rows, including those with null values, but count() on a column counts the known values for this column.

In my last post, I discussed how a left outer join can display all employees alongside their commissions. Employees with existing but unknown commissions have their column value null, except their primary key. Employees with no existing commission show up in the outer join but with their primary key columns being also null.

Here are all counts on the left outer join, counting rows, counting null keys, and counting null columns:

yugabyte=# select employees.*, commissions.*
 from employees left outer join commissions using(empno)
;
 empno |    job    | ename  | sal  | empno | comm
-------+-----------+--------+------+-------+------
  7839 | PRESIDENT | KING   | 5000 |  7839 |
  7499 | SALESMAN  | ALLEN  | 1600 |  7499 |  300
  7844 | SALESMAN  | TURNER | 1500 |  7844 |    0
  7788 | ANALYST   | SCOTT  | 3000 |       |
(4 rows)

yugabyte=# select count(*) as "employees"
     , count(commissions.empno) as "emp. with commissions"
     , count(comm) "emp. with known commissions"
 from employees left outer join commissions using(empno)
;
 employees | emp. with commissions | emp. with known commissions
-----------+-----------------------+-----------------------------
         4 |                     3 |                           2
(1 row)
Enter fullscreen mode Exit fullscreen mode

The databases show four employees. Three of them have a commission, and two of them have a known commission.

Top comments (0)