Franck Pachot for AWS Heroes

Posted on Dec 19

Referential integrity In The Absence Of Foreign Key

#aws #aurora #dsql #database

Aurora DSQL does not support Foreign Keys.
How can you ensure referential integrity?

It's often stated that the application can handle this rather than the database, but the application lacks one critical capability: the ability to see what ongoing concurrent transactions are doing.
To prevent database corruption during a race condition, verifying that the parent key exists and has not been concurrently deleted is essential. Similarly, when you remove a key and confirm there are no child rows, it’s vital to ensure that no one is currently adding one.

While the database can observe the activities of other transactions, the application cannot due to transaction isolation. How can you address this? There are four solutions that I described in Declarative vs. Application side Foreign Key referential integrity:

Declare the Foreign Key in the database. But 🚫 Aurora DSQL doesn't support Foreign Keys.
Use a serializable isolation level to get an error when conflicting concurrent operations occur. But 🚫 Aurora DSQL doesn't support Serializable.
Lock the child table when you delete the parent key. But 🚫 Aurora DSQL doesn't support LOCK TABLE.
Lock the parent key when inserting a row in the child table ✅.

Let's explore this row-locking possibility.

SELECT FOR SHARE is sufficient to prevent a concurrent deletion. But 🚫 Aurora DSQL doesn't support LOCK FOR SHARE.
One solution remains: SELECT FOR UPDATE on the parent row when inserting into the child table ✅.

So, one possibility remains. I'll use the "orders"/"orderlines" schema that I created in a previous post:

dsql=> \d orders
                              Table "public.orders"
   Column   |           Type           | Collation | Nullable |      Default
------------+--------------------------+-----------+----------+-------------------
 order_id   | uuid                     |           | not null | gen_random_uuid()
 country    | text                     |           |          |
 created_at | timestamp with time zone |           |          | now()
Indexes:
    "orders_pkey" PRIMARY KEY, btree_index (order_id) INCLUDE (country, created_at)
    "orders_country_created_at_idx" btree_index (country, created_at)

dsql=> \d orderlines
                              Table "public.orderlines"
         Column          |           Type           | Collation | Nullable | Default
-------------------------+--------------------------+-----------+----------+---------
 order_id                | uuid                     |           | not null |
 line_id                 | integer                  |           | not null |
 product                 | text                     |           |          |
 denormalized_country    | text                     |           |          |
 denormalized_created_at | timestamp with time zone |           |          |
Indexes:
    "orderlines_pkey" PRIMARY KEY, btree_index (order_id, line_id) INCLUDE (product, denormalized_country, denormalized_created_at)
    "orderlines_denormalized_country_product_denormalized_create_idx" btree_index (denormalized_country, product, denormalized_created_at)

dsql=>

I wasn't able to declare the foreign key, but "order_id" in "orderlines" should reference an "order_id" in "orders".

I insert an order that I'll use in my test and keep its value in the :order_id variable:

dsql=> insert into orders (order_id)
           values ('ffffffff-ffff-ffff-ffff-ffffffffffff')
           returning order_id
dsql-> \gset
INSERT 0 1

Here is a transaction that deletes a row from the "orders" table along with its child rows in the "orderlines" table:


delete from orderlines where order_id=:'deleted_order_id';
delete from orders     where order_id=:'deleted_order_id';

If a concurrent transaction inserts a row in "orderlines", the delete will not see it, but it may appear after the transaction is committed. The application must prevent this situation to avoid creating an orphean row.

Here is a safe transaction that inserts a row in "orderlines" only if the parent exists in "orders":

begin;
\set exists false
select true as exists 
 from orders 
 where order_id =  :'new_order' 
 for update
\gset
\if :exists
insert into orderlines (order_id, line_id) 
 values ( :'new_order' , 1 );
\endif
commit;

The SELECT FOR UPDATE declares an exclusive lock intent that conflicts with a concurrent transaction that would delete it. If the application strictly uses these procedures to manipulate orders and order lines, it guarantees referential integrity without a foreign key.

Let's test it. In the first session, I check that the parent exists with a FOR UPDATE and insert the child row:

dsql=> begin;
BEGIN
dsql=*> \set exists false
dsql=*> select true as exists
             from orders
             where order_id =  'ffffffff-ffff-ffff-ffff-ffffffffffff'
             for update
dsql-*> \gset
dsql=*> \if :exists
dsql=*> insert into orderlines (order_id, line_id)
             values ( 'ffffffff-ffff-ffff-ffff-ffffffffffff' , 1 )
;
INSERT 0 1
dsql=*> \endif
dsql=*>

While the transaction is ongoing, as I haven't committed yet, another transaction deletes the parent row:

dsql=*> \! psql -c "delete from orders where order_id = 'ffffffff-ffff-ffff-ffff-ffffffffffff'" -c "delete from orderlines where order_id = 'ffffffff-ffff-ffff-ffff-ffffffffffff'" & sleep 1
DELETE 1
DELETE 0

This is successful because Aurora DSQL uses optimistic concurrency control, and the SELECT FOR UPDATE intent was not synchronized to be visible to the other transactions.

When I try to commit my inserting transaction, the conflict is detected, and it fails:

dsql=*> commit;
ERROR:  change conflicts with another transaction, please retry: (OC000)
dsql=>

By adding additional data logic to our business logic, we can enforce referential integrity and avoid race conditions that corrupt our database. But how does it scale?

In PostgreSQL-compatible databases, we may prefer using SELECT FOR SHARE rather than SELECT FOR UPDATE because multiple inserts for the same parent should be able to be processed concurrently. Let's say the order is a big one and shared by multiple users who add their order lines.

I insert the parent again:

dsql=> insert into orders (order_id)
           values ('ffffffff-ffff-ffff-ffff-ffffffffffff')
           returning order_id
dsql-> \gset
INSERT 0 1

A first session inserts a line 1:

dsql=> begin;
BEGIN
dsql=*> select true as exists
             from orders
             where order_id =  'ffffffff-ffff-ffff-ffff-ffffffffffff'
             for update
;
 exists
--------
 t
(1 row)

dsql=*> insert into orderlines (order_id, line_id)
             values ( 'ffffffff-ffff-ffff-ffff-ffffffffffff' , 1 )
;
INSERT 0 1

It hasn't been committed yet, while another session is inserting line 2:

dsql=> begin;
BEGIN

dsql=*> select true as exists
             from orders
             where order_id =  'ffffffff-ffff-ffff-ffff-ffffffffffff'
             for update
;
 exists
--------
 t
(1 row)

dsql=*> insert into orderlines (order_id, line_id)
             values ( 'ffffffff-ffff-ffff-ffff-ffffffffffff' , 2 )
;
INSERT 0 1

dsql=*> commit;
COMMIT

It was committed, but when the first session was committed, it fails:

postgres=*> commit;
ERROR:  change conflicts with another transaction, please retry: (OC000)

The application must attempt a retry without any business justification, as inserting two rows with the same parent should be permitted. Consider using the same method to reference the product catalog: only one product can be ordered simultaneously in the system.

Ironically, Optimistic Concurrency Control, designed to enhance scalability, restricts throughput. SELECT FOR SHARE could solve this, allowing concurrent transactions. However, at least in the preview version, it is unavailable in Aurora DSQL.

Write skew is usually prevented by using a serializable isolation level. For databases that do not support it, it is possible to modify the data model and use SELECT FOR UPDATE to achieve the same result, transforming the write skew problem into a repeatable read one. Here is another example:

YugabyteDB Distributed PostgreSQL Database

The Doctor's On-Call Shift example and a Normalized Relational Schema to Avoid Write Skew

Franck Pachot for YugabyteDB Distributed PostgreSQL Database ・ Oct 18

#yugabytedb #postgres #oracle #database

DEV Community

Referential integrity In The Absence Of Foreign Key

The Doctor's On-Call Shift example and a Normalized Relational Schema to Avoid Write Skew

Franck Pachot for YugabyteDB Distributed PostgreSQL Database ・ Oct 18

Top comments (0)

Read next

Everyone Uses Postgres… But Why?

Navigating Amazon Web Services: A Guide to Getting Help through Email

ACID Properties in Databases - Part 2: Your Shield Against Transactional Chaos

Week 3 in DevOps: Beginning with Advanced AWS Services and Security