DEV Community

Judy
Judy

Posted on

Search for the closest matching record within the group:From SQL to SPL

The table mytable in the MS SQL database has one ConfirmationStarted and multiple Closed statuses for each ID.
Image description
Now we need to find the record closest to ConfirmationStarted among all the Closed records before ConfirmationStarted in each ID, and retrieve the ID and time fields of the record.
Image description
SQL solution:

WITH cte AS (
    SELECT ID, CreatedAt, NewStatus,
        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedAt DESC) AS rn
    FROM mytable
    WHERE NewStatus = 'Closed'
    AND CreatedAt < (
        SELECT CreatedAt FROM mytable AS sub
        WHERE sub.ID = mytable.ID AND sub.NewStatus = 'ConfirmationStarted'
    )
)
SELECT ID, CreatedAt as xdate
FROM cte
WHERE rn = 1
ORDER BY ID;
With cte AS (
    SELECT ID, CreatedAt, NewStatus,
        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedAt DESC) AS rn
    FROM mytable
    WHERE NewStatus = 'Closed'
    AND CreatedAt < (
        SELECT CreatedAt FROM mytable AS sub
        WHERE sub.ID = mytable.ID AND sub.NewStatus = 'ConfirmationStarted'
    )
)
SELECT ID, CreatedAt as xdate
FROM cte
WHERE rn = 1
ORDER BY ID;

Enter fullscreen mode Exit fullscreen mode

SQL does not have natural sequence numbers, so it needs to generate sequence numbers using window functions first. After SQL grouping, it must aggregate immediately and records within the group cannot be filtered. It can only be solved in a roundabout way by filtering repeatedly using multi-level subqueries. The overall code is a bit cumbersome and difficult to understand.

SPL has natural sequence numbers and provides rich position related calculations. SPL grouping can retain subsets after grouping, making it easier to process data within the group.

Image description
A1: Load data from the database and sort it by time.

A2: Group by ID, but do not aggregate.

A3: Filter each group of data, first find the records before ConfirmationStarted, and then filter out Closed from them, getting the last one. The select function is used for conditional filtering, which supports position related calculations during filtering, @c represents starting from the first record that makes the condition true and stopping when encountering a record that makes the condition false. @1 represents getting the first piece of the results, and @z represents filtering from back to front.

A2-A4 can be combined into one statement: =A1.group(ID;~.select@c(NewStatus!="ConfirmationStarted").select@z1(NewStatus=="Closed").CreatedAt:xdate)

Open source address

Free download

Top comments (0)