DEV Community

Cover image for 🏁🐘Winning Race Conditions With PostgreSQL
Randall
Randall

Posted on • Edited on

🏁🐘Winning Race Conditions With PostgreSQL

Race conditions suck! They can be extremely difficult to debug, and often only occur naturally in production at the most critical moments (times of high traffic). Fortunately, many databases (such as PostgreSQL) give us powerful tools to manage and avoid race conditions.

In this article, we will do a deep dive into how to avoid race conditions while using PostgreSQL, focusing especially on enforcing uniqueness constraints. We'll start with some simpler cases, then move on to more complex ones, so feel free to skip down depending on your level of experience.

This article assumes that you are familiar with race conditions, why they happen, and why they suck. (if you aren't familiar, don't worry, you will learn the hard way soon enough! We all do.) The later cases also assume at least an intermediate knowledge of relational databases.

This article will use the Node.js pg library for examples.

Let's just dive right in and look at some different cases where our code and data are vulnerable to race conditions, and find out how to fix them.

Case 1: Simple Uniqueness

Most applications do not allow two users to share the same email address. If a user tries to register with an email already registered by another account, registration should fail.

One might write such a uniqueness check in JavaScript like this:

async function registerUser(email, hashedPassword) {
  const existingUserResult = await pool.query(
    'SELECT 1 FROM "user" WHERE email = ?;',
    [email],
  );

  if (existingUserResult.rows.length > 0) {
    throw new Error('User already exists');
  }

  await pool.query(
    'INSERT INTO "user" VALUES (?, ?);',
    [email, hashedPassword],
  );
}
Enter fullscreen mode Exit fullscreen mode

But this code is vulnerable to race conditions. If two users try to sign up at the exact same time with the same email, then it's possible for a sequence of events like this to happen:

  1. User #1 SELECT operation
  2. User #2 SELECT operation
  3. User #1 INSERT operation
  4. User #2 INSERT operation

Since the SELECT operations happen for both users before the INSERT happens for either one, both users will survive the duplicate-check and proceed to the INSERT operation. The nature of asynchronous programming allows such sequences of events.

Fortunately this problem is really easy to fix, we just need to add a UNIQUE constraint to the email column in the user table. We can do this when creating the table:

CREATE TABLE "user" (
  "email" VARCHAR UNIQUE NOT NULL,
  "hashedPassword" NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

Or later, by using ALTER TABLE.

By setting a UNIQUE constraint on the email column, we are telling the database that no two rows may have the same value for email, and the database itself will enforce this for us. Even if the two INSERTs happen at exactly the same time, concurrency features in the database guarantee that only one will succeed and the other will error.

Case 2: Compound Uniqueness

Let's say we're running a multi-user blogging application, like Dev.to, and we want to allow users to create one highlighted post per week. Our posts table might look like this:

CREATE TABLE "posts" (
  "userId" INT NOT NULL,
  "createdAt" TIMESTAMP WITHOUT TIMEZONE NOT NULL,
  "highlighted" BOOLEAN NOT NULL,
  "postContent" TEXT NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

We could write code similar to the first example, with a uniqueness check like this:

const existingWeeklyHighlightedPostResult = await pool.query(
  `
    SELECT 1
    FROM "posts"
    WHERE
      "userId" = ?
      AND
      "highlighted" IS TRUE
      AND
      /* "createdAt" is greater than the first moment
       * of the current week */
      "createdAt" >= DATE_TRUNC('week', NOW());
  `,
  [userId],
);

if (existingWeeklyHighlightedPostResult.rows.length > 0) {
  throw new Error('You already have a highlighted post this week');
}
Enter fullscreen mode Exit fullscreen mode

However this would have the same issue. If a user submits two highlighted posts at the same time, both of them might pass this uniqueness check. Maybe the user accidentally double-clicked the submit button, or maybe they're trying to exploit this to get more ad revenue 🤷

Like last time, can we create a UNIQUE constraint to help us? Yes! Even though we store an exact timestamp rather than the week that a post was created in, PostgreSQL's support of indexes on expressions gives us what we need. Further, we need to use the partial index feature to only enforce this constraint on highlighted posts:

CREATE UNIQUE INDEX "one_highlighted_post_per_week_constraint"
ON "posts" ("userId", DATE_TRUNC('week', "createdAt"))
WHERE "highlighted" IS TRUE;
Enter fullscreen mode Exit fullscreen mode

With this UNIQUE index, the database will not not allow two rows to both have "highlighted" IS TRUE and to have the same combination of userId, DATE_TRUNC('week', "createdAt"). In other words, a user may only have one highlighted post per week.

For any rows where "highlighted" IS FALSE, they are exempt from this constraint and we can insert as many of them as we want.

Case 3: Compound Multiple Uniqueness

Same case as above, but rather than one highlighted post per week, we want to allow users to make three highlighted posts per week. Can we do this with a UNIQUE constraint like we did above?

Once again, yes, but the solution here may be a bit more convoluted. First, we create the same posts table, but instead of a BOOLEAN highlighted column we add an INT weeklyHighlightedPostNumber column:

CREATE TABLE "posts" (
  "userId" INT NOT NULL,
  "createdAt" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
  "weeklyHighlightedPostNumber" INT,
  "postContent" TEXT NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

If a post is highlighted, then its "weeklyHighlightedPostNumber" will be an integer. If the post is not highlighted, "weeklyHighlightedPostNumber" will be NULL.

Now we add a constraint that forces weeklyHighlightedPostNumber to be a number between 1 and 3, if it's not NULL:

ALTER TABLE "posts" ADD CONSTRAINT num_weekly_posts_constraint CHECK
(
  ("weeklyHighlightedPostNumber" IS NULL)
  OR
  ("weeklyHighlightedPostNumber" BETWEEN 1 AND 3)
);
Enter fullscreen mode Exit fullscreen mode

Now we can add a UNIQUE constraint:

CREATE UNIQUE INDEX "three_highlighted_posts_per_week_constraint"
ON "posts" (
  "userId",
  DATE_TRUNC('week', "createdAt"),
  "weeklyHighlightedPostNumber"
)
WHERE "weeklyHighlightedPostNumber" IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

This will enforce that for any highlighted posts (rows with non-NULL "weeklyHighlightedPostNumber"), they may not have the same combination of "userId", DATE_TRUNC('week', "createdAt"), and "weeklyHighlightedPostNumber". Since the earlier constraint requires "weeklyHighlightedPostNumber" to be between 1 and 3, this limits us to 3 highlighted posts per week per user.

This does mean that when inserting a post, you need to figure out the next available post number. We can do this with a bit of SQL in our INSERT operation. This solution also handles gaps (for example if you have three highlighted posts and delete the second one). It does start getting a little hairy, but check it out:

async function createHighlightedPost(userId, content) {
  const insertResult = await pool.query(
    `
      WITH next_highlighted_post_num AS MATERIALIZED (
        SELECT series_num
        FROM GENERATE_SERIES(1, 3) AS series_num
        WHERE NOT EXISTS (
          SELECT *
          FROM posts
          WHERE
            posts."userId" = $1
            AND
            DATE_TRUNC('week', NOW()) <= posts."createdAt"
            AND
            posts."weeklyHighlightedPostNumber" = series_num
        )
        LIMIT 1
      )
      INSERT INTO posts
      SELECT $1, NOW(), series_num, $2
      FROM next_highlighted_post_num;    
      `,
    [userId, content],
  );

  if (insertResult.rowCount === 0) {
    throw new Error('Could not create highlighted post');
  }
}
Enter fullscreen mode Exit fullscreen mode

Of course you could instead just write a simple SELECT query to get all currently existing "weeklyHighlightedPostNumber"s for the given user and current week, and write JavaScript code to choose the new row's "weeklyHighlightedPostNumber" to INSERT.

Case 4: Compound Referential Uniqueness

This case will be similar to case #2, but instead of "one highlighted post per week", we'll modify the condition slightly to "must wait 7 days before making another highlighted post".

In case #2, if a user makes a highlighted post on a Wednesday, they're next able to make one at the start of the next week (Monday).

But here in case #4, if a user makes a highlighted post on a Wednesday, they have to wait until the same time next Wednesday before they can make another.

This would require any constraint to reference the user's previous highlighted post creation date, which is something that typical UNIQUE constraints simply cannot do. To solve this we'll need to bust out some extra features: transactions and advisory locks.

This will be the final and trickiest case that we look at.

We will use the same table schema as in case #2:

CREATE TABLE "posts" (
  "userId" INT NOT NULL,
  "createdAt" TIMESTAMP WITHOUT TIMEZONE NOT NULL,
  "highlighted" BOOLEAN NOT NULL,
  "postContent" TEXT NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

And here is a solution in JavaScript:

async function createHighlightedPost(userId, content) {
  const postInsertLockNamespace = 5000;
  const queryRunner = await pool.connect();

  try {
    await queryRunner.query('BEGIN TRANSACTION;');

    await queryRunner.query(
      'SELECT PG_ADVISORY_XACT_LOCK(?, ?);',
      [postInsertLockNamespace, userId],
    );

    const existingWeeklyHighlightedPosts = await pool.query(
      `
        SELECT 1
        FROM posts
        WHERE
          "userId" = ?
          AND
          highlighted IS TRUE
          AND
          "createdAt" >= NOW() - INTERVAL '1 week'
      `,
      [userId],
    );

    if (existingWeeklyHighlightedPosts.rows.length > 0) {
      throw new Error(
        'Already have a highlighted post in the previous seven days'
      );
    }

    await queryRunner.query(
      'INSERT INTO "posts" VALUES (?, ?, ?, ?);',
      [userId, new Date(), true, content],
    );

    await queryRunner.query('COMMIT');
  } catch (err) {
    await queryRunner.query('ROLLBACK');
    throw err;
  } finally {
    queryRunner.release();
  }
}
Enter fullscreen mode Exit fullscreen mode

For this to work, anything which is going to modify (INSERT, UPDATE, or DELETE) records in the posts table should first SELECT PG_ADVISORY_XACT_LOCK(5000, userId);. This takes a lock, and any other transaction that tries to take a lock with the same arguments will have to wait for it to be released. If we're careful to always take this lock before modifying posts, we can trust that after we take the lock, nothing else will modify posts for that user until after we release the lock. This means after we do the SELECT statement to get existingWeeklyHighlightedPosts, we know that the result of that will remain correct until after our INSERT completes and the transaction commits. This effectively prevents a user from submitting a highlighted post if they have already submitted one in the past seven days, even if they are spamming us with parallel post requests.

However, it can be hard to make sure that your code is well-behaved and always takes the lock before modifying posts, especially if other developers are working on the same codebase (or some psycho (totally not yourself of course) logs in with pgAdmin and runs random queries!). If anyone does not properly take the lock before inserting, then this approach breaks.

To help a little bit, we can create a trigger that automatically takes the lock whenever you INSERT or UPDATE or DELETE rows in this table. That would look like this:

CREATE FUNCTION "take_post_modify_lock_function"() 
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
  IF OLD."userId" IS NOT NULL THEN
    PERFORM PG_ADVISORY_XACT_LOCK(5000, OLD."userId");
  END IF;

  IF NEW."userId" IS NOT NULL THEN
    PERFORM PG_ADVISORY_XACT_LOCK(5000, NEW."userId");
  END IF;

  RETURN NEW;
END;
$$;

CREATE TRIGGER "take_post_modify_lock_trigger"
BEFORE INSERT OR UPDATE OR DELETE ON "posts"
FOR EACH ROW
EXECUTE PROCEDURE "take_post_modify_lock_function"();
Enter fullscreen mode Exit fullscreen mode

This trigger will force any modification of posts to take the lock, so you don't need to remember to do it in code when you modify posts.

You do still need to do it in code before SELECTing to verify your constraint condition though. There is no such thing as SELECT triggers, and even if there were, taking the lock would be wasteful for queries that don't really need to take it (like when you're fetching a list of posts to display on the homepage). If you don't take the lock before SELECTing, then someone else might get the lock and do an INSERT before you do the INSERT you're planning to do based on the results of the SELECT.

‡ The 5000 chosen above for the first argument to PG_ADVISORY_XACT_LOCK() is arbitrary. We could have chosen any other number. What's important is that it should be different from any other types of locks you create, so that locks with different meanings don't overlap. For example if we had another table comments and wanted to do a similar thing with locks, we could use 5001 for it.

SERIALIZABLE Transaction Isolation Level

There's actually some secret alternative magic sauce that can make this all work without any explicit locking, and that's the SERIALIZABLE transaction isolation level. Most SQL databases have this, but it's often a bit different between them. For example PostgreSQL's version provides much stronger guarantees than MySQL's version (which can be a little deceptive).

When you use the SERIALIZABLE transaction isolation level and you do a SELECT inside of a transaction, PostgreSQL "remembers" what you SELECTed, and if any data changes before the transaction finishes in a way that would cause your SELECT query to have returned a different result, then your transaction will get a "serialization failure" and have to ROLLBACK.

I cannot stress enough that this is an extremely powerful feature and if you simply switch it on (as the default: ALTER DATABASE <DATABASE NAME> SET DEFAULT_TRANSACTION_ISOLATION TO SERIALIZABLE;) and consistently use transactions, then you can write your code without having to think about explicit locking. However there are a number of guidelines and pitfalls you have to be aware of:

  1. Use explicit transactions (BEGIN TRANSACTION; statement) for any operations you do that execute more than one SQL statement (such as the createHighlightedPost() JavaScript function we looked at above).
  2. Avoid running any transactions with lower isolation levels, and be very careful if you do, since it can be harder to reason about how transactions with different isolation levels interact.
  3. Be ready to retry transactions, since you may encounter serialization failures pretty regularly, which is normal and expected.
  4. Keep transactions as short as possible, since having more transactions running at any given time will increase the chance of serialization failures.
  5. Understand that the predicate locking used by SERIALIZABLE transactions does introduce some non-trivial overhead.
  6. Be aware that certain patterns don't work well under SERIALIZABLE transaction isolation level. For example using SELECT ... FOR UPDATE SKIP LOCKED to implement a highly concurrent queue will not work (you'll just get bombarded with serialization failures).
  7. Make sure you have indexes that support your queries, since transactions that have to do full table scans can greatly increase the amount of serialization failures.

Conclusion

Sometimes having to worry about race conditions is a "good problem to have", since it suggests that you have a lot of users using your system. Hopefully they're paying you!

But as the user base grows, race conditions can cause more and more mysterious and crippling faults, and can sometimes even be caused on purpose by exploiters.

I hope this article gave you some new techniques to deal with these cases, and if you know of any other good ones, please share in the comments!

While writing this, I had a billion ideas for additional cases, footnotes, and asterisks, but couldn't include it all without making quite a mess. If you have any questions or are interested in additional cases, feel free to comment!

Top comments (2)

Collapse
 
akashkava profile image
Akash Kava

No matter what you do, you will still hit the race condition even after using NOT EXISTS, best way is to retry after failure to check if record already exists in next attempt. There should be at least two attempts. First check if record exists, if not try to insert, if insert fails, check if record exists or not.

I have used this method in entity access ORM I have written for typescript and there are two methods such as selectOrInsert, upsert, both of which retries operation at least 3 times.

Collapse
 
mistval profile image
Randall • Edited

Absolutely, or you can choose to just let the error bubble up to the user and they can give it another shot, especially if you expect the error to be very rare under normal conditions. In these cases the goal is to avoid violation of the constraints you want on your data, rather than avoiding errors necessarily.