DEV Community

Judy
Judy

Posted on

Farewell to Window Functions, Embrace esProc SPL

Early SQL was extremely unsuited for ordered computations. While theoretically possible, the practical difficulties made it essentially unusable.

The introduction of window functions improved ordered computations to some extent. However, SQL’s foundation is still based on unordered sets. Even with patches like window functions, it remains cumbersome.

Yet, there were no better alternatives, so database-based operations still require using SQL combined with window functions. While loading data to process in Python can mitigate some of the difficulties, Python’s poor integration makes it difficult to work with Java applications. If coding directly in Java, it would be more cumbersome than using window functions.

esProc SPL solves these problems fundamentally. SPL code is easier to understand than SQL window functions, and easier to write and debug as well. We can say goodbye to cumbersome window functions.

Let’s demonstrate how SPL completely outperforms window functions through specific calculation scenarios.

For example, to calculate how many trading days it took for a stock to exceed $100 after its IPO, a straightforward approach is to sort the data by trading day, then identify the position of the first record exceeding $100. The corresponding index is the desired result.

Coding in SQL window functions:

SELECT MIN(NO) 
FROM (
    SELECT Price, ROW_NUMBER() OVER ( ORDER BY Date ) NO
        FROM stock
      )
WHERE Price>100
Enter fullscreen mode Exit fullscreen mode

First, calculate the row number using a window function. Then, use an outer query to filter the data and find the minimum row number.

Since SQL is based on unordered sets, data tables inherently lack the concept of order or position. Therefore, even if you sort the data table, window functions cannot utilize the order; the ORDER BY clause must be written within the window function. This is an inherent problem with SQL, and window functions have to inherit this flaw.

The trouble doesn’t stop there. In the SQL statement, NO is a computed column, but its minimum value cannot be directly calculated; an additional layer of nesting is required.

For a computed column without involving window functions, the minimum value can be calculated directly. For example, calculate the minimum value by multiplying the closing price by the exchange rate of the day:

SELECT MIN(Price*Rate) FROM stock
Enter fullscreen mode Exit fullscreen mode

Window functions can only operate on the result sets of other operations, often requiring another layer of nested query. Consequently, the syntax is more complex than ordinary SQL.

SPL data tables are ordered and incorporate the concept of position. For stock data ordered by date, the straightforward approach mentioned earlier can be used. The resulting code is remarkably concise.

stock.pselect(Price>100)
Enter fullscreen mode Exit fullscreen mode

pselect is SPL’s function for calculating position, which returns the position of the first member satisfying the specified condition, that is, the number of trading days required for the stock to first exceed $100.

Now, let’s calculate the price increase when the stock first exceeded $100. We first identify the position of the first record where the closing price exceeds $100. The increase is then calculated by subtracting the previous day’s closing price from the closing price of that record. This remains a natural approach.

Using SQL window functions would be more convoluted:

WITH T AS (
    SELECT Price,
        ROW_NUMBER() OVER ( ORDER BY Date ) NO,
        Price - LAG(Price) OVER ( ORDER BY Date) Rising
    FROM Stock
    )
SELECT Rising 
FROM T WHERE NO = (
    SELECT MIN(NO) FROM T WHERE Price>100 
   )
Enter fullscreen mode Exit fullscreen mode

SQL cannot implement the previous approach of first finding the position and then calculating the adjacent difference. Instead, it must first calculate the difference for all rows and then locate the desired position. Additionally, since ORDER BY can only be specified within the window function, it must be repeated twice. If clauses like PARTITION BY are involved, they must also be repeated, resulting in redundant code that complicates both writing and reading.

Thanks to SPL’s support for positional calculation, the code is very simple:

stock.calc(stock.pselect(Price>100),Price-Price[-1])
Enter fullscreen mode Exit fullscreen mode

First, use the pselect function to calculate the position of the desired record. Then, calculate the difference between the closing price at that position and the previous record’s closing price. SPL supports cross-row calculation. Price[-1] represents the previous record’s closing price.

Such scenarios are very common in practice. Let’s continue with stock calculations to explore more examples.

Calculate the start and end dates of the longest consecutive price increase interval for a stock.

Window function-based solution:

with t1 as (
    select Date, Price, 
        case when Price>lag(Price) over(order by Date) then 0 else 1 end UpDownTag
    from stock
    ),
t2 as (
    select Date, Price, sum(UpDownTag) over(order by Date) NoRisingDays
    from t1
    ),
t3 as (
    select NoRisingDays, count(*) ContinuousDays 
    from t2 group by NoRisingDays
    ),
t4 as (
    select * 
    from t2
    where NoRisingDays in (
        select NoRisingDays
        from t3
        where ContinuousDays =(select max(ContinuousDays) from t3)
            )
    )
select min(Date) start, max(Date) end
from t4
group by NoRisingDays;
Enter fullscreen mode Exit fullscreen mode

This code nests multiple-layer subqueries and redundantly repeats the same ORDER BY clause within the window function, leading to significant complexity.

SPL avoids hard-to-understand window functions, resulting in very concise code:

stock.group@i(Price<Price[-1]).maxp@a(~.len()).new(~(1).Date:start,~.m(-1).Date:end)
Enter fullscreen mode Exit fullscreen mode

group@i performs ordered grouping based on condition changes, with each group representing a consecutive price increase interval. maxp@a returns all the longest consecutive price increase intervals. The ~ symbol denotes the current member being processed in the loop.

Identify all stocks with a longest consecutive price increase interval exceeding 5 days.

SQL solution:

select Code, max(ContinuousDays) as longestUpDays
from (
    select Code, count(*) as ContinuousDays
    from (
        select Code, Date, 
            sum(UpDownTag) over (partition by Code order by Code, Date) as NoRisingDays
        from (
            select Code, Date, 
                case when Price > lag(Price) over (partition by Code order by Code, Date)  then 0
                else 1 end as UpDownTag
            from stock
        )
    )
    group by Code, NoRisingDays
)
group by Code
having max(ContinuousDays)>5;
Enter fullscreen mode Exit fullscreen mode

This code nests four-layer subqueries, redundantly appearing the same partition and order information.

SPL is still very simple:

stock.group(Code;~.group@i(Price<=Price[-1]).max(~.len()):max_increase_days).select(max_increase_days>5)
Enter fullscreen mode Exit fullscreen mode

Calculate the price increase on days when the stock price is higher than both the preceding 5 days and the following 5 days.

Window function-based solution:

SELECT Price/Price_pre-1 AS raise
FROM(
    SELECT Date, Price, ROWNUM AS rn,
        MAX(Price) OVER (
            ORDER BY Date ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS max_pre,
        MAX(Price) OVER (
            ORDER BY Date ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING) AS max_suf,
        LAG(Price,1) OVER (ORDER BY Date) AS Price_pre
    FROM stock
    )
WHERE rn>5 AND rn<=(select count(*) FROM stock)-5 AND Price>max_pre  
AND Price>max_suf;
Enter fullscreen mode Exit fullscreen mode

The same ORDER BY information is repeated three times within the window function, and a nested subquery is also required.

It’s still simple using SPL:

Image description

Such cases are common in other business scenarios, so we will not elaborate further here.

By leveraging SPL’s computational capabilities, window functions can be avoided. This enables SQL to focus solely on data retrieval, while SPL handles the complex logic. For instance, suppose the data in the previous example is sourced from a MySQL database:

Image description
A1: Connect to the database and retrieve data using an SQL statement.

A2: Execute complex calculations using very concise SPL code.

By adopting SPL, you just need to write simple SQL in applications, avoiding the need for complex and hard-to-understand window functions. Additionally, since window function syntax differs across databases, this approach also resolves compatibility issues between heterogeneous databases, achieving multiple benefits in one stroke.

SPL is open source and free,
Source address
Free download

Top comments (0)