UUIDv7 in YugabyteDB

#yugabytedb #distributed #postgres #sql

When creating a user-defined function for inserts, such as one that generates a default value, you should ensure it does not cause a row-by-row flush (Write Buffering) in YugabyteDB. The function must be a simple expression (no generate_series, no with recursive). Here is an example of how to generate a UUIDv7.

Here is a function that generates a UUIDv7 from a timestamp (defaulting to now()) and a shift interval from this timestamp (defaulting to no shift):

create or replace function
 uuid7(shift_interval interval DEFAULT '0 milliseconds', p_timestamp timestamp with time zone default now())
 returns uuid as $$
  select (
    -- Calculate timestamp in microseconds (shifted by the interval if provided)
    lpad(to_hex(trunc(extract(epoch from p_timestamp + shift_interval )*10^3)::bigint), 12, '0') ||  -- Timestamp in microseconds
    -- Calculate random part A, apply version (UUIDv7), and convert to hex with padding
    lpad(to_hex(
     trunc((extract(epoch from p_timestamp + shift_interval ) * 10^6  -- Convert timestamp to microseconds
     - trunc(extract(epoch from p_timestamp) * 10^3) * 10^3) * 4.096  -- Create random value based on microseconds
     )::bigint | x'0000000000007000'::bigint), 4, '0') ||  -- Set version to '7' (UUIDv7 version)
    -- Generate random part B (using random numbers for 80-bit random value), apply variant, and convert to hex with padding
    lpad(to_hex(
     trunc(random() * 2^30)::bigint << 32 |  -- First part of the random value (30 bits shifted)
     trunc(random() * 2^32)::bigint |  -- Second part of the random value (32 bits)
     x'8000000000000000'::bigint), 16, '0')  -- Set variant to '10xx...' (UUIDv7 variant)
   )::uuid;  -- Combine everything into a final UUID and cast as uuid
$$ language sql;

This function can be used as the DEFAULT column clause of a CREATE TABLE. I apply a modulo eight on the PID for the shift interval. This helps distribute the inserts into eight segments for improved throughput while ensuring that entries inserted simultaneously are colocated within each range:

create table demo ( 
 id uuid primary key 
         default uuid7( pg_backend_pid()%8 * interval '1 year') 
);

explain (analyze, dist, costs off)
insert into demo select from generate_series(1,1000) returning *;

Let's check the scalability in a distributed SQL database with explain(analyze, dist) to display the number of write operations and the number of flushes:

yugabyte=> explain (analyze, dist, costs off)
insert into demo select from generate_series(1,1000) returning *;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Insert on demo (actual time=0.127..6.836 rows=1000 loops=1)
   Storage Table Write Requests: 1000
   ->  Function Scan on generate_series (actual time=0.068..2.585 rows=1000 loops=1)
 Planning Time: 0.231 ms
 Execution Time: 17.823 ms
 Storage Read Requests: 0
 Storage Rows Scanned: 0
 Storage Write Requests: 1000
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 1
 Storage Flush Execution Time: 9.960 ms
 Storage Execution Time: 9.960 ms
 Peak Memory Usage: 140 kB

The plan indicates that 1,000 Write Requests have been grouped into one Flush Request. This approach is efficient, as the function did not require a flush for each row.

DEV Community

UUIDv7 in YugabyteDB

Top comments (0)

Read next

Building A URL Shortener With Neon and Azure Serverless Functions

Is SPL more difficult or easier than SQL?

Postgres vs. MySQL

Applying GraphQL to Backend with Node.js and PostgreSQL