DEV Community

Judy
Judy

Posted on

Flexible interval aggregation:From SQL to SPL

The range of the ID field in a table of Oracle database is integers 1-2000.

Image description

Now we need to group IDs into multiple consecutive intervals, sum up the values within each group, and arrange the calculation results in order of intervals. The intervals are in the form of a sequence of numbers that can be flexibly modified. For example, when the numerical sequence is 10, 20, 100, 1000, 2000, it means grouping and aggregating IDs in the intervals of 1-10, 11-20, 21-100, 101-1000, and 1001-2000. Note that the starting value format of the grouping string is: member of the numerical sequence+1.

Image description

SQL:

WITH id_ranges (min_value, max_value) AS (
  SELECT LAG(COLUMN_VALUE, 1, 0) OVER (ORDER BY COLUMN_VALUE) + 1,
         COLUMN_VALUE
  FROM   TABLE(SYS.ODCINUMBERLIST(10,20,100,1000,2000))
)
SELECT r.min_value || '-' || r.max_value AS id_range,
       SUM(value) AS sum_of_values
FROM   dummy_data d
       INNER JOIN id_ranges r
       ON     d.id BETWEEN r.min_value AND r.max_value
GROUP BY r.min_value, r.max_value
order by  r.min_value
Enter fullscreen mode Exit fullscreen mode

SQL does not provide a function to find which interval a value belongs to, so it cannot directly group by intervals. It needs to take a detour to create a temporary interval table, and then group and aggregate after association. The structure is quite complex.

SPL code is much simpler and easier to understand:

Image description

A1: Load data.

A3: Directly group and aggregate by interval, the pseg function returns the interval number where a certain value is located, without the need for table creation or association, @r represents left-open, right-closed interval.

A4: Change the group number to the specified string format.

SPL is open source free, open source address

Top comments (0)