I have seen situations where different compatible types were used in referential integrity AKA foreign key constraints. When we say data types are compatible, it means there is a conversion path between the types.
The catalog pg_cast
stores data type conversion paths, both built-in and user-defined but it does not represent every type conversion that the system knows how to perform. Yet, we can infer some compatible types from pg_cast
.
Let us have a look at compatible types for integer
and varchar
postgres=# SELECT castsource::regtype, casttarget::regtype FROM pg_cast WHERE castsource IN ('integer'::regtype, 'varchar'::regtype) ORDER BY castsource;
castsource | casttarget
-------------------+-------------------
integer | bigint
integer | smallint
integer | real
integer | double precision
integer | numeric
integer | money
integer | boolean
integer | oid
integer | regproc
integer | regprocedure
integer | regoper
integer | regoperator
integer | regclass
integer | regcollation
integer | regtype
integer | regconfig
integer | regdictionary
integer | regrole
integer | regnamespace
integer | "char"
integer | bit
character varying | character
character varying | "char"
character varying | name
character varying | regclass
character varying | xml
character varying | character varying
character varying | text
(28 rows)
Let us assume we have a products
table and another table storing orders of those products. We want to ensure that the orders
table only contains orders of products that actually exist. So, we define a foreign key constraint in the orders table that references the products table:
postgres=# CREATE TABLE products (
product_number bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
product_name text,
price numeric
);
CREATE TABLE
postgres=# CREATE TABLE orders (
order_id integer PRIMARY KEY,
product_number integer REFERENCES products (product_number),
quantity integer
);
CREATE TABLE
Everything seems to work. Let us populate the tables
postgres=# INSERT INTO products (product_name, price) VALUES ('book', 10), ('pen', 5), ('bag', 15);
INSERT 0 3
postgres=# INSERT INTO orders (order_id, product_number, quantity) VALUES (1, 1, 1), (2, 2, 1), (3, 3, 1);
INSERT 0 3
Also, everything works. But what happens when product_number
is greater than 2147483648
(max value of integer).
postgres=# SELECT setval('products_product_number_seq', 2147483648);
setval
------------
2147483648
(1 row)
postgres=# INSERT INTO products (product_name, price) VALUES ('shoe', 20) RETURNING product_number;
product_number
----------------
2147483649
(1 row)
postgres=# INSERT into orders (order_id, product_number, quantity) VALUES (4, 2147483649, 1);
ERROR: integer out of range
We got integer out of range
error. One way to fix this is to change orders.product_number
type to bigint
but we risk full table rewrite and application downtime for the duration of the change. Let us find out:
postgres=# SELECT relname, relfilenode FROM pg_class WHERE relname = 'orders';
relname | relfilenode
---------+-------------
orders | 42404
(1 row)
postgres=# ALTER TABLE orders ALTER COLUMN product_number TYPE bigint ;
ALTER TABLE
postgres=# SELECT relname, relfilenode FROM pg_class WHERE relname = 'orders';
relname | relfilenode
---------+-------------
orders | 42419
We can see relfilenode
has changed, this tells us the table was rewritten. Imagine having to do this in a heavily-loaded large database.
There are other ways of adjusting the type such as using triggers and mirrored columns but that can be tasking.
Most common source of type mismatch in referential integrity happens when using types that have size/length constraint that such as smallint
, integer
serial
, character varying(N)
.
So, is there a way we can catch this kind of mismatch earlier, maybe during development/review?
We can use the query below which produces nice info/message about type mismatch in referential integrity
WITH pk AS (
SELECT CASE nc.nspname
WHEN 'public' THEN nc.nspname || '.' || c.conrelid::regclass::text
ELSE c.conrelid::regclass::text
END AS conrelid,
c.conindid::regclass,
string_agg(a.attname, ', ' order by a.attnum) as attnames,
string_agg(format_type(a.atttypid, a.atttypmod), ', ' order by a.attnum) AS atttypid
FROM pg_namespace AS nc
JOIN pg_class AS r ON nc.oid = r.relnamespace
JOIN pg_attribute AS a ON r.oid = a.attrelid
JOIN pg_constraint AS c ON r.oid = c.conrelid
WHERE a.attnum = ANY (c.conkey)
AND NOT a.attisdropped
AND c.contype = 'p'
AND r.relkind IN ('r', 'p')
AND nc.nspname NOT IN ('information_schema', 'pg_catalog')
GROUP BY nc.nspname, c.conrelid::regclass::text, c.conindid::regclass
), fk AS (
SELECT CASE nc.nspname
WHEN 'public' THEN nc.nspname || '.' || c.conrelid::regclass::text
ELSE c.conrelid::regclass::text
END AS conrelid,
c.conindid::regclass,
string_agg(a.attname, ', ' order by a.attnum) as attnames,
string_agg(format_type(a.atttypid, a.atttypmod), ', ' order by a.attnum) AS atttypid
FROM pg_namespace AS nc
JOIN pg_class AS r ON nc.oid = r.relnamespace
JOIN pg_attribute AS a ON r.oid = a.attrelid
JOIN pg_constraint AS c ON r.oid = c.conrelid
JOIN pk ON pk.conindid = c.conindid
WHERE a.attnum = ANY (c.conkey)
AND NOT a.attisdropped
AND c.contype = 'f'
AND r.relkind IN ('r', 'p')
AND nc.nspname NOT IN ('information_schema', 'pg_catalog')
GROUP BY nc.nspname, c.conrelid::regclass::text, c.conindid::regclass
)
SELECT 'Parent ' || pk.conrelid || '.{' || pk.attnames || '} datatype {' || pk.atttypid || '} is different from child ' || fk.conrelid || '.{' || fk.attnames || '} datatype {' || fk.atttypid || '}' AS info
FROM pk
JOIN fk ON pk.conindid = fk.conindid
WHERE pk.atttypid != fk.atttypid;
info
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Parent public.products.{product_number} datatype {bigint} is different from child public.orders.{product_number} datatype {integer}
The query above can also be incorporated into some continuous integration.
Summary
It is easy to miss out the right data types when creating referential integrity and this can have impactful implications during the lifecycle of an application. Few of the implications are:
- Application errors due to size/length constraint such as in
integer
andcharacter varying(N)
. - Application downtime when setting the right data type.
- If there is a rewrite, the rewrite will temporarily require as much as double the disk space of the table.
Key takeaways
Review the data types in referential integrity as part of your database design review.
Top comments (2)
Never realise you can write a query for these mismatches. Good stuff. Though I think postgres ought to implement some setting to block this behaviour.
SQL implementation allows such behavior so postgres is just following the implementation.