DEV Community

Rap2h
Rap2h

Posted on

Fast & reliable JSONB deep structure update in PostgreSQL

Assuming you have a PostgreSQL database with a survey table. A survey has a questions column (JSONB), each question has choices.

Sample content of the questions column for the "web dev" survey:

[
  {
    "id": "c98b53c5",
    "text": "pick a language",
    "choices": [
      {"id": "235d0482", "text": "Rust"},
      {"id": "652ee824", "text": "PHP"},
      {"id": "f543366f", "text": "Forth"}
      // etc.
    ]
  },
  {
    "id": "73294354",
    "text": "pick a frontend framework",
    "choices": [
      {"id": "3feba90d", "text": "React"},
      {"id": "d6aa4dbd", "text": "Vue"},
      {"id": "9beb5da3", "text": "Angular"}
      // etc.
    ]
  },
]
Enter fullscreen mode Exit fullscreen mode

We want to replace the choice with ID 9beb5da3 (Angular) in question 73294354 (pick a frontend framework) for survey 8bf9fedb (the ID of the survey for web dev survey)

- {"id": "9beb5da3", "text": "Angular"}
+ {"id": "1acffbaf", "text": "Svelte"}
Enter fullscreen mode Exit fullscreen mode

To update a subpart of a JSONB column, we could use jsonb_set. Given a path with indices and keys, we can replace the target object with a new object:

update survey
set questions = jsonb_set(
    questions,
    '{1,choices,2}', -- path to angular
    '{"id": "1acffbaf", "text": "Svelte"}'::jsonb,
    false)
where id = '8bf9fedb'; -- the survey id
Enter fullscreen mode Exit fullscreen mode

Still, the problem is that we don't know the actual positions of the items in the arrays. We have to find them first. The only thing we know is the IDs of the object.

🤔 "I know I want to update choice 9beb5da3 in question 73294354; I don't know what is their positions in their respective arrays."

One could have run one first query to get the whole column data, then iterate over objects with their favorite programming language to find item positions, then use these positions to configure the jsonb_set path parameter in order to run the update query.

This approach has two caveats:

  • It's slow; get all the record data from DB, find positions programmatically, then update the data in the database.
  • It's not reliable: the positions of data could have changed during this process, between the two queries.

We can do this with only one operation: one update main query that contains two sub-queries to build indices.

Solution

First, we need to get the index of the question 73294354:

select question, questionIndex
from survey
cross join jsonb_array_elements(questions) 
    with ordinality arr(question, questionIndex)
where id='8bf9fedb' and question->>'id'='73294354'
Enter fullscreen mode Exit fullscreen mode

jsonb_array_elements function expands the array to a set of JSON values, with ordinality adds a number (starting from 1, increasing by 1) for each row of the function output. This query returns:

question questionindex
{"id": "73294354", "text": "pick a frontend framework", "choices": [{"id": "3feba90d", "text": "React"}, {"id": "d6aa4dbd", "text": "Vue"}, {"id": "9beb5da3", "text": "Angular"}]} 2

Then, we need to get the index of the choice 9beb5da3 (which is in the question 73294354: we have to be sure we search only in this question, in case some choices share the same index in different questions).

To do so, we have to use the first query as a subquery, then get the index:

-- This is the previous query, see above.
-- Using "WITH", also known as Common Table Expressions (CTE),
-- helps in reading complex queries (avoid nested subquery).
with questions_table as (
    select question, questionIndex
    from survey
    cross join jsonb_array_elements(questions)
        with ordinality arr(question, questionIndex)
    where id = '8bf9fedb' and question->>'id'='73294354'
)
-- We can now use questions_table to find choice index.
select choice, choiceIndex
from questions_table
cross join jsonb_array_elements((question->'choices')::jsonb)
    with ordinality arr(choice, choiceIndex)
where choice->>'id'='9beb5da3'
Enter fullscreen mode Exit fullscreen mode

We just used again jsonb_array_elements and with ordinality, nothing new here. As a result, we have got the choice we want to change and it's position.

choice choiceindex
{"id": "9beb5da3", "text": "Angular"} 3

Now, we are ready to update the record, since we have the index to build the path. Building the path is simply creating a string thanks to the index we have from previous steps:

'{' || questionIndex-1 || ',choices,' || choiceIndex-1 || '}'
Enter fullscreen mode Exit fullscreen mode

☝️ Since ordinality starts to 1, we have subtracted 1 to the indices.

We can modify the second query to directly return the path we want:

with questions_table as (
    select question, questionIndex
    from survey
    cross join jsonb_array_elements(questions)
        with ordinality arr(question, questionIndex)
    where id = '8bf9fedb' and question->>'id'='73294354'
)
select
  ('{'||questionIndex-1||',choices,'||choiceIndex-1||'}')
  as path
from questions_table
cross join jsonb_array_elements((question->'choices')::jsonb)
    with ordinality arr(choice, choiceIndex)
where choice->>'id'='9beb5da3'
Enter fullscreen mode Exit fullscreen mode
path
{1,choices,2}

For the last step, we need to create the update query that uses the two previous queries to actually update the choice:

with questions_table as (
    select question, questionIndex
    from survey
    cross join jsonb_array_elements(questions)
        with ordinality arr(question, questionIndex)
    where id = '8bf9fedb' and question->>'id'='73294354'
), choices_table as (
    select ('{'||questionIndex-1||',choices,'||choiceIndex-1||'}')::text[] as path
    from questions_table
    cross join jsonb_array_elements((question->'choices')::jsonb)
        with ordinality arr(choice, choiceIndex)
    where choice->>'id'='9beb5da3'
)
update survey
set questions = jsonb_set(
    questions,
    path, -- the path we built just before.
    '{"id": "1acffbaf", "text": "Svelte"}'::jsonb,
    false)
from choices_table
where id = '8bf9fedb';
Enter fullscreen mode Exit fullscreen mode
1 row affected in 7 ms
Enter fullscreen mode Exit fullscreen mode

It's fast, since we are only reading and writing one row in survey. It's reliable because we are doing only one query.

Some links:

Feel free to post a comment if you need more information.

Top comments (0)