DEV Community

Cover image for Breaking IndexedDB consistency to explore its transactions
Ryan Cooke
Ryan Cooke

Posted on

Breaking IndexedDB consistency to explore its transactions

Did you know every browser has a built-in key-value database?


I’ve been building a Chrome extension, and learning about IndexedDB while using it for an offline mode. Everything that the app reads from the API gets processed and stored locally. Then the app renders and is reactive to changes from the database.

Having done loads of backend development, I'm curious to know how I can use IndexedDB (IDB for short) with strong consistency - basically without losing data. After reading in some threads that data is easy to corrupt in IDB, it's time to explore the transaction model, and suss out what code patterns might lead to unexepected inconsistency.

So I coded up a little app that would do lots of writes into the same "store" (IDBs name for table). The idea is to use web workers to execute simultaneous writes across threads, and see if we can stomp on parallel updates to lose data. "Stomping" means using concurrency to have code execute and overwrite each others' state.

Since I use Svelte for everything, I set up a simple app with tailwind and daisyUI. You can see an example of this in my post "Svelte Without Kit".

First, the docs

After perusing IDB docs on transactions, I didn't see any details about thread safety or anything about under the hood. Is there some kind of multiversion concurrency control happening? Hard to say. But there was one example, and it emphasized that transactions are applied in order by when the transaction is created. Still not sure what that means across threads, but an ordering guarantee is a good sign that transactions provide some predictability, so let's write some code to find out!

You can find the complete code of the following files:

  • App.svelte // Svelte component and UI
  • worker.js // web worker that interacts with IDB
  • pool.js // utility function for distributing work among workers
  • database.js // utility function for connecting to IDB

Test One: update inside transaction

My first test sends messages to a web worker to update a "thread" object in IDB, and then verifies that the update succeeded. The workers each add a random number "labelId" to an array on the thread object. The idea is to test - if workers update the same thread object's array simultaneously, will one overwrite array the other in IDB and lose a labelId?

Here's the database code for updating the thread object, notice I am adding the labelId after starting the transaction and fetching the thread object from the store -

// threadId and labelId variables are in scope
database() // returns a promise with an instance of IDBDatabase
.then((db) => {
  // start transaction
  const txn = db.transaction(['threads'], 'readwrite')
  txn.oncomplete = () => db.close()
  const store = txn.objectStore('threads')

  return new Promise((resolve, reject) => {
    // read thread object from IDB store
    const getThread = store.get(threadId)
    getThread.onsuccess = ({ target }) => {
      // create new thread object if it doesn't exist
      const thread = target.result || { id: threadId, labelIds: [], attr: true }
      // add labelId to array in thread object
      if (!thread.labelIds.includes(labelId)) {
        thread.labelIds.push(labelId)
      }

      // write thread object to IDB store
      const putThread = store.put(thread)
      putThread.onsuccess = () => resolve(thread)
      putThread.onerror = () => reject(new Error(`Error writing message: ${putThread.error}`))
    }
    getThread.onerror = () => reject(new Error(`Error getting message: ${getThread.error}`))
  })
})
Enter fullscreen mode Exit fullscreen mode

For our App UI, we have a button to toggle the test running or not, and a running test will messages to a pool of workers. Refer to below for the full code, but it basically looks like this:

  function startStopTest() {
    if (testRunning) {
      const promises = []
      for (let step = 0; step < 15; step++) {
        const lId = Math.floor(Math.random() * 1000)
        const p = messageWorker({ threadId: step % 5, labelId: lId, action: 'addLabel' }, url)
          .then(//... another message)
          .then(//... another message)
        promises.push(p)
      }

      Promise.all(promises)
        .then(function (results) {
          //... process results
        })
        .then(tick)
        .then(startStopTest)
    }
  }
Enter fullscreen mode Exit fullscreen mode

Let's fire it up!

Test one results

Result

There are no missing labelIds in any thread object's list, so no data is being corrupted across simultaneous updates. Well that was underwhelming!

Test Two: Add another update

Next I'll try adding a second update to the worker, this time flipping a boolean on the object. The database code is the same as before - start transaction, get a thread object, modify it, put object back in store. My idea is to see if adding more updates to the same object can cause corruption.

./worker.js
function toggleAttr(threadId, labelId) {
  return database().then((db) => {
    const txn = db.transaction(['threads'], 'readwrite')
    txn.oncomplete = () => db.close()
    const store = txn.objectStore('threads')

    return new Promise((resolve, reject) => {
      const getThread = store.get(threadId)
      getThread.onsuccess = ({ target }) => {
        const thread = target.result
        thread.attr = !thread.attr
        const putThread = store.put(thread)
        putThread.onsuccess = () => resolve()
        putThread.onerror = () => reject(new Error(`Error writing thread: ${putThread.error}`))
      }
      getThread.onerror = () => reject(new Error(`Error getting thread: ${getThread.error}`))
    })
  })
}
Enter fullscreen mode Exit fullscreen mode

Get it going!

Test 2 results

Result

Same result, no corruption

Test Three: separate read and write transactions

I'm curious what happens if I read the object in one transaction, update the list, then write the object in a separate transaction. This will test if the consistency I've achieved so far is a result of transaction serialization or something.

Let's modify our worker code, it will fetch a thread object in one transaction:

function getThread(db, threadId) {
  const txn = db.transaction(['threads'], 'readonly')
  const store = txn.objectStore('threads')

  return new Promise((resolve, reject) => {
    const getThread = store.get(threadId)
    getThread.onsuccess = ({ target }) => {
      const thread = target.result || { id: threadId, labelIds: [], attr: true }
      resolve({ db, thread })
    }
    getThread.onerror = () => reject(new Error(`Error getting thread: ${getThread.error}`))
  })
}
Enter fullscreen mode Exit fullscreen mode

Then it updates the object list, before starting a new transaction and writing the object to IDB:

database()
  .then((db) => getThread(db, threadId))
  .then(({ db, thread }) => {
    if (!thread.labelIds.includes(labelId)) {
      thread.labelIds.push(labelId)
    }

    return new Promise((resolve, reject) => {
      const txn = db.transaction(['threads'], 'readwrite')
      txn.oncomplete = () => db.close()
      const store = txn.objectStore('threads')
      const putThread = store.put(thread)
      putThread.onsuccess = () => resolve()
      putThread.onerror = () => reject(new Error(`Error writing thread: ${putThread.error}`))
    })
  })
Enter fullscreen mode Exit fullscreen mode

Let 'er rip!

Result

Immediately we see errors, missing labelIds:

Test 3 results

Corruption accomplished! Almost half of the objects are missing labelIds. That's bad!

This suggests that updating a thread object outside of a transaction, then writing it to the store, is susceptible to data loss. But before you jump up and say "obviously!", think about the complex applications you've built. It's pretty common for data modifications to happen outside of database transactions, for example in response to user input, and then to push that data to storage, for example through an API. But IDB is only consistent with modifications to an object that is also fetched/read within the same transaction. At least according to this test.

Test Four: mixing updates inside and outside transactions

Finally, I wanted to know if updates made outside transactions could conflict with updates happening inside transactions. So I added another "conflicting" action to the worker, one that deletes the labelId outside of a transaction. And reverted the original add labelId action to use an in-transaction update, which should be safe from data loss?

Result

Tests were still passing:

Test 4 results

But labelIds were not being removed from the thread objects:

labelId array is not empty

Once I update the delete labelId worker to modify inside a transaction, labelId lists look more reasonable:

labelId array is empty

Analysis

Ok so what's going on here? IndexedDB seems capable of handling concurrent updates as long as objects are read, updated and written in the same transaction.

If I go back to the IDB docs, I recall how it highlighted that transactions followed some kind of ordering, based on when they are created. This leads me to think that transactions are "serialized", meaning they act as if they are executed one at a time (very watered-down definition of serialization).

Here's a sketch of the pattern that leads to data loss -

Order won't stop data loss

Notice that there is no way to order the "put" transactions that doesn't lose data. If Worker1 writes first, its array of labelIds will get overwritten by Worker2, and vice versa.

Now what if we update within a transaction -

Order doesn't cause data loss

Order doesn't matter for these transactions, as long as they are executed serially, the object in IDB doesn't lose any data!

Conclusion

So it seems that IDB is safe as long as updates are made to objects fetched within the same transaction. I even tested transactions with long running code between the get and put, and data consistency held. (There are some situations where transactions get closed after they are passed between closures, but that raises an error).

While it's great to see that IDB can be used with consistency, this mechanism dramatically affects how code using IDB is written! You can't reliably accumulate changes to an object, and then write that object to the database, without potential for stomping on another update happening simultaneously. So if your app has multiple code paths for updating IDB, this is a risk.

I should also note that there are some updates where order does really matter. For example, adding then removing a value relies on the "add" happening before the "remove", otherwise the final value is wrong. Again, pretty common for this to happen in a web app, say a user clicks really fast.

These kind of consistency problems are exacerbated by the fact that IDB only supports a primitive put method - there's no way to make partial or conditional updates. So as a developer, I'm left with solving for consistency, and it requires me to be very deliberate about how state is updated.

Ah well, keep on coding!

App.svelte

Simple page with a button to start and stop tests, as well as a summary of the results of the running test.

<script>
  import { messageWorker } from './pool'
  import { tick } from 'svelte'

  let successCount = 0,
    failedCount = 0,
    totalCount = 0,
    lastError = undefined,
    testRunning = false

  const url = new URL('./worker.js', import.meta.url)

  function toggleTest() {
    testRunning = !testRunning
    if (testRunning) {
      successCount = failedCount = totalCount = 0
      lastError = undefined
    }
    startStopTest()
  }

  function startStopTest() {
    if (testRunning) {
      const promises = []
      for (let step = 0; step < 15; step++) {
        const lId = Math.floor(Math.random() * 1000)
        const p = messageWorker({ threadId: step % 5, labelId: lId, action: 'addLabel' }, url)
          .then(function ({ threadId, labelId }) {
            return messageWorker({ threadId, labelId, action: 'toggleAttr' }, url)
          })
          .then(function ({ threadId, labelId }) {
            return messageWorker({ threadId, labelId, action: 'verifyLabel' }, url)
          })
          .then(function ({ threadId, labelId }) {
            return messageWorker({ threadId, labelId, action: 'deleteLabel' }, url)
          })
        promises.push(p)
      }

      Promise.all(promises)
        .then(function (results) {
          results.forEach(({ error, threadId, labelId }) => {
            if (error) {
              failedCount = failedCount + 1
              lastError = { error, threadId, labelId }
            } else {
              successCount = successCount + 1
            }

            totalCount = totalCount + 1
          })
          return Promise.resolve()
        })
        .then(tick)
        .then(startStopTest)
    }
  }
</script>

<main>
  {#if testRunning}
    <button class="btn btn-primary" on:click={toggleTest}>Stop test</button>
  {:else}
    <button class="btn btn-warning" on:click={toggleTest}>Start test</button>
  {/if}
  <p class="text-sm font-light py-4">
    {totalCount} total tests, {successCount} passed, {failedCount} failed
  </p>
  {#if lastError != undefined}
    <p class="text-lg text-error">
      {lastError.error?.message || 'Unknown Error'}
    </p>
  {:else}
    <p class="font-semibold italic text-lg text-primary-content">No errors</p>
  {/if}
</main>
Enter fullscreen mode Exit fullscreen mode

database.js

A utility function to open a connection to the database, and return the connection in a promise.

// ./database.js
export function database() {
  return new Promise((resolve, reject) => {
    const request = indexedDB.open('test_db', 1)
    request.onsuccess = () => resolve(request.result)
    request.onerror = () => reject(request.error)
    request.onblocked = () => console.warn('pending till unblocked')
    request.onupgradeneeded = (e) => {
      const up = request.result
      if (e.oldVersion < 1) {
        const threads = up.createObjectStore('threads', {
          keyPath: 'id',
          autoIncrement: false
        })
      }
    }
  })
}
Enter fullscreen mode Exit fullscreen mode

pool.js

A utility function to manage a pool of web workers and message a worker. A promise is returned, that will be invoked when the worker responds. Promises are tracked by a nonce value is that is passed with the message.

// ./pool.js

const workers = []
const promises = {}

function resolveMessage(data) {
  promises[data.nonce].resolve(data)
  delete promises[data.nonce]
}

export function messageWorker(message, url) {
  const ix = Math.floor(Math.random() * 5)
  if (workers.at(ix) == undefined) {
    workers[ix] = new Worker(url, { type: 'module' })
    workers[ix].onmessage = (event) => resolveMessage(event.data)
  }

  const rand = Math.floor(Math.random() * 50000)
  const nonce = `${rand}`
  const promise = new Promise((resolve) => {
    promises[nonce] = { resolve }
  })

  workers[ix].postMessage({ nonce, ...message })
  return promise
}
Enter fullscreen mode Exit fullscreen mode

worker.js

A background worker for manipulating data with various transaction patterns.

//./worker.js
import { database } from './database'

function handleMessage(message, respond) {
  let pr
  if (message.action == 'addLabel')
    pr = addLabel(message.threadId, message.labelId)
  else if (message.action == 'toggleAttr')
    pr = toggleAttr(message.threadId, message.labelId)
  else if (message.action == 'verifyLabel')
    pr = verifyLabel(message.threadId, message.labelId)
  else if (message.action == 'deleteLabel')
    pr = deleteLabel(message.threadId, message.labelId)
  else
    pr = Promise.reject(new Error('No action in message'))

  pr
   .then(() => respond({ success: true, ...message }))
   .catch((error) =>
    respond({ error, success: false, ...message })
  )
}

function addLabel(threadId, labelId) {
  return database().then((db) => {
    const txn = db.transaction(['threads'], 'readwrite')
    txn.oncomplete = () => db.close()
    const store = txn.objectStore('threads')

    return new Promise((resolve, reject) => {
      const getThread = store.get(threadId)
      getThread.onsuccess = ({ target }) => {
        const thread = target.result || { id: threadId, labelIds: [], attr: true }
        if (!thread.labelIds.includes(labelId)) {
          thread.labelIds.push(labelId)
        }

        const putThread = store.put(thread)
        putThread.onsuccess = () => resolve()
        putThread.onerror = () => reject(new Error(`Error writing thread: ${putThread.error}`))
      }
      getThread.onerror = () => reject(new Error(`Error getting thread: ${getThread.error}`))
    })
  })
}

function toggleAttr(threadId, labelId) {
  return database()
    .then((db) => getThread(db, threadId))
    .then(({ db, thread }) => {
      thread.attr = !thread.attr
      return new Promise(function (resolve, reject) {
        const txn = db.transaction(['threads'], 'readwrite')
        txn.oncomplete = () => db.close()
        const store = txn.objectStore('threads')
        const putThread = store.put(thread)
        putThread.onsuccess = () => resolve()
        putThread.onerror = () => reject(new Error(`Error writing thread: ${putThread.error}`))
      })
    })
}

function verifyLabel(threadId, labelId) {
  return database()
    .then((db) => getThread(db, threadId))
    .then(({ db, thread }) => {
      db.close()
      if (thread == undefined) {
        return Promise.reject(new Error(`Missing thread ${threadId}`))
      } else if (!thread.labelIds.includes(labelId)) {
        return Promise.reject(new Error(`Thread ${threadId} is missing label ${labelId}`))
      } else {
        return Promise.resolve()
      }
    })
}

function deleteLabel(threadId, labelId) {
  return database()
    .then((db) => getThread(db, threadId))
    .then(({ db, thread }) => {
      thread.labelIds = thread.labelIds.filter((lid) => lid != labelId)

      return new Promise((resolve, reject) => {
        const txn = db.transaction(['threads'], 'readwrite')
        txn.oncomplete = () => db.close()
        const store = txn.objectStore('threads')

        const putThread = store.put(thread)
        putThread.onsuccess = () => resolve()
        putThread.onerror = () => reject(new Error(`Error writing thread: ${putThread.error}`))
      })
    })
}

function getThread(db, threadId) {
  const txn = db.transaction(['threads'], 'readonly')
  const store = txn.objectStore('threads')

  return new Promise((resolve, reject) => {
    const getThread = store.get(threadId)
    getThread.onsuccess = ({ target }) => {
      const thread = target.result || { id: threadId, labelIds: [], attr: true }
      resolve({ db, thread })
    }
    getThread.onerror = () => reject(new Error(`Error getting thread: ${getThread.error}`))
  })
}

self.onmessage = (message) => {
  handleMessage(message.data, postMessage)
}
Enter fullscreen mode Exit fullscreen mode

Top comments (2)

Collapse
 
graphographer profile image
John Swartz

Thanks so much for these thoughtful tests and writeup. Have you come up with a strategy for addressing this in your apps?

I was thinking perhaps something like tracking an active transaction as a singleton and using it til it's closed, assuming that it has been opened with the correct object stores; and if not, awaiting its completion before creating a new transaction.

A more conservative approach might be to just use a mutex throughout: every unit of work / transaction must await the completion of any current transaction.

Collapse
 
debussyman profile image
Ryan Cooke

Keeping an "active transaction" singleton comes with some pretty big drawbacks - it limits the app from using web workers, and fails to work if multiple tabs are open to the same page/site.

So far I've been able to get away with modeling the data where every update is idempotent. Sometimes this does require keeping extra data, such as keeping a list of deleted elements rather than deleting them from the object outright.

But for more complex applications, I do think the mutex approach would be best. Check out the Web Locks API, as it is designed to work across multiple open tabs.