Today, we will explore the Window Functions in the GBase 8s database, focusing on the Window Frame clause and how it helps us process and analyze data more efficiently.
Window functions are a tool introduced in the SQL standard that allows us to perform calculations on a set of rows (called a window) in a query without losing row details, as traditional aggregate functions do. With window functions, we can perform complex calculations such as cumulative sums, moving averages, rankings, etc., while maintaining the integrity of the data rows. The Window Frame clause is an essential part of window functions, defining the scope of the window, i.e., which rows will be included in the calculation. By flexibly defining the window scope, we can meet various complex data analysis requirements.
Overview
The Window Frame clause returns a subset of rows within each window partition. The frame clause specifies how this subset is defined by a specific number of rows or a range of values. The frame is determined relative to the current row, allowing the frame to move within the partition based on the current row's position.
For example:
- By defining a frame as all rows from the start of the partition to the current row, we can calculate the cumulative sum up to the current row.
- By defining a frame as extending N rows on either side of the current row, we can calculate a rolling average.
Syntax
Key Terms in the Window Frame Clause
The following keywords define the scope of the window:
Element | Description |
---|---|
OFFSET | An unsigned integer representing the offset from the current row position. Cannot be negative. If zero, it specifies the current row. |
RANGE | A value-based frame clause. |
ROWS | A row-based frame clause. |
UNBOUNDED | All rows from the current row to the limit of the window partition. |
UNBOUNDED PRECEDING | The starting boundary is the first row in the partition. |
UNBOUNDED FOLLOWING | The ending boundary is the last row in the partition. |
PRECEDING | Specifies a negative offset from the current row. |
FOLLOWING | Specifies a positive offset from the current row. |
CURRENT ROW | The current row. |
Row-Based (ROWS) Window Frame
The ROWS keyword creates a row-based window frame. The frame is defined by the positions of the starting and ending rows, where the offset represents the number of rows to return.
Example 1: Return the current row and the previous 6 rows.
avg(price) OVER (ORDER BY year, day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
Example 2: Requirement: Partition employees by department, sort by salary, and return the average salary of the current employee and the previous employee within each partition.
-- Create table
CREATE TABLE employees (
id serial PRIMARY KEY,
emp_id VARCHAR(20) NOT NULL,
name VARCHAR(20) NOT NULL,
age int NOT NULL,
salary decimal(10,2) NOT NULL
);
-- Insert data
INSERT INTO employees (emp_id, name, age, salary) VALUES
('E00A', 'carry', 18, 5000),
('E00A', 'lili', 28, 5500),
('E00B', 'tom', 22, 6000),
('E00B', 'amy', 25, 6500),
('E00B', 'chun', 26, 7000),
('E00C', 'zouzou', 25, 5200),
('E00D', 'candy', 23, 5800),
('E00D', 'mumu', 29, 6200);
-- Query: Partition by department, sort by salary, and calculate the average salary of the current and previous employee.
SELECT
emp_id, name, salary,
AVG(salary) OVER (
PARTITION BY emp_id
ORDER BY salary
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
) AS avg_salary
FROM employees;
-- Result
EMP_ID NAME SALARY AVG_SALARY
E00A carry 5000.00 5000.000000000
E00A lili 5500.00 5250.000000000
E00B tom 6000.00 6000.000000000
E00B amy 6500.00 6250.000000000
E00B chun 7000.00 6750.000000000
E00C zouzou 5200.00 5200.000000000
E00D candy 5800.00 5800.000000000
E00D mumu 6200.00 6000.000000000
8 row(s) retrieved.
Value-Based (RANGE) Window Frame
The RANGE keyword creates a value-based frame clause. The frame is defined by rows within a specific value range, where the offset is the difference between the row value and the current row value. The frame consists of the current row and rows that meet the criteria set by the sorting key in the ORDER BY clause and the specified offset. The offset represents the number of units of the sorting key's data type. The sorting key must be numeric, DATE, or DATETIME. For example, if the sorting key is of DATE type, the offset represents a specific number of days.
Example 1: Return rows with dates within 2 days before or after the current row.
COUNT(*) OVER (ORDER BY ship_date
RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING)
Example 2: Requirement: Partition employees by department, sort by age, and return the average salary of employees whose age differs by no more than 5 years.
SELECT
emp_id, name, age, salary,
AVG(salary) OVER (
PARTITION BY emp_id
ORDER BY age
RANGE BETWEEN 5 PRECEDING AND 5 FOLLOWING
) AS avg_salary
FROM employees;
-- Result
emp_id NAME age salary avg_salary
E00A carry 18 5000.00 5000.000000000
E00A lili 28 5500.00 5500.000000000
E00B tom 22 6000.00 6500.000000000
E00B amy 25 6500.00 6500.000000000
E00B chun 26 7000.00 6500.000000000
E00C zouzou 25 5200.00 5200.000000000
E00D candy 23 5800.00 5800.000000000
E00D mumu 29 6200.00 6200.000000000
In the E00A partition, the age difference between the two employees is greater than 5 years, so the average salary is their own salary. In the E00B partition, the age difference is within 5 years, so the average salary includes the other employee's salary.
Conclusion
Through the above introduction, we can see that the Window Functions and Window Frame clause in GBase 8s provide powerful data analysis capabilities. Whether it's precise row-based control or flexible value-based processing, window functions help us quickly achieve complex calculation requirements. In practical applications, we can choose the appropriate window type and scope based on specific needs to process and analyze data more efficiently.
We hope this article helps you better understand and apply window functions, unlocking new perspectives in data analysis. If you have any questions or suggestions about window functions or GBase 8s, feel free to leave a comment. Let's learn and grow together!
Top comments (0)