DEV Community

Cover image for A better way to search arrays in Postgres
Aditya Agrawal
Aditya Agrawal

Posted on • Originally published at adiagr.in

A better way to search arrays in Postgres

When using arrays in Postgres often time you would have a need to filter based on an element in the array. This can be done using a simple gin index on the column.

Example

Assume resources Expense, Tag, and Org.

  • 5M Expenses
  • 10 Orgs
  • 2K Tags
create table expenses(
   id serial primary key,
   org_id integer not null,
   tag_ids integer[] not null
);

insert into expenses 
   select 
       id, 
       1+id%9 org_id, 
       gi.user_ids
   from 
      generate_series(1, 5000000) id
   join lateral (
      select 
          i%5000000 pid, 
          array_agg(round(random() * 2000)) user_ids
      from generate_series(1, 5.5*5000000) i
      group by i%5000000
   ) gi on pid=id
;
Enter fullscreen mode Exit fullscreen mode

Roughly every expense can have 5 tags. We need to be able to list all expenses in an org with a certain tag.

The org_id filter in most cases comes from access layers where a user with a token can access the data of one org only for which the session is active.

select 
    *
from
    expenses ex
where
    ex.org_id=1 
    and ex.tag_ids @> '{7}'
;
Enter fullscreen mode Exit fullscreen mode

Note: @> is 'contains' operator
Postgresql: functions-array

No Indexes:

Execution Time: 1682.684 ms
Enter fullscreen mode Exit fullscreen mode

Index on org_id:

create index on expenses(org_id);
Execution Time: 228.352 ms
Enter fullscreen mode Exit fullscreen mode

Improvement of 7.3x

Can we do better? Enter Gin Index on tag_ids:

create index on expenses(org_id);
create index on expenses using gin (tag_ids);
Execution Time: 43.473 ms
Enter fullscreen mode Exit fullscreen mode

Improvement of another 5.2x

Can we do even better? Enter Partial Index:

create index on expenses using gin (org_id, tag_ids);
ERROR:  data type integer has no default operator class for access method "gin"
Enter fullscreen mode Exit fullscreen mode

Enter btree_gin extension:

create extension btree_gin;
create index on expenses using gin (org_id, tag_ids);

Execution Time: 7.293 ms
Enter fullscreen mode Exit fullscreen mode

Improvement of another 5.9x

Summary:

Time: 1682.68 -> 228.35 -> 43.47 -> 7.29ms
Enter fullscreen mode Exit fullscreen mode

Improvements:

  • total: 230x
  • from just org_id index: 31x

Postgres provides you with an extension out of the box called 'btree_gin'.

btree_gin provides sample GIN operator classes that implement B-tree equivalent behavior for simple non-composite data types.

References:

Top comments (0)