Introduction
TL;DR : Why to use UNPIVOT
over UNION ALL
? Performance, UNION ALL
scans all the data for each subquery, with UNPIVOT
the scan happens only once.
We shall perform now the same operation with AWS Athena
Feel free to check the Oracle XE CROSS JOIN UNPIVOT tips or a AWS documentation.
I'm writing this guide to provide a simple step-by-step manual easy to understand.
UNPIVOT
If you are new to Athena
you may wish to go through the quick start guide
Let's start straight away with something trivial :
select 'SUNDAY' as day0, 'MONDAY' as day1;
and use array
select array['SUNDAY', 'MONDAY'] as day;
now, let's flatten the array
with UNNEST
:
select * from unnest(array['SUNDAY', 'MONDAY']) t(day);
and add another column into the output :
with
in_data as (select * from unnest(array['SUNDAY', 'MONDAY'], array[0, 1]) t(day, day_order))
select * from in_data;
That was fast and straight to the point! Let's explore CROSS JOIN
CROSS JOIN
In a nutshell CROSS JOIN
operation is simple = it's just a Cartesian product of the two tables, take all the possible combinations from the values on the left with the values on the right.
Few examples :
with
week_data as (select '2024W01' as week_ref),
day_data as (select 'SUNDAY' as day_ref)
select * from week_data cross join day_data;
with UNION ALL
:
with
week_data as (select '2024W01' as week_ref),
day_data as (select 'SUNDAY' as day_ref
union all select 'MONDAY' as day_ref)
select * from week_data cross join day_data;
and with array
:
with
week_data as (select '2024W01' as week_ref),
day_data as (select array['SUNDAY','MONDAY'] as day_ref)
select * from week_data cross join day_data;
let's flatten the array :
with
week_data as (select '2024W01' as week_ref),
day_data as (select * from unnest(array['SUNDAY','MONDAY']) r(day_ref))
select * from week_data cross join day_data;
and add another one to show the full power of the CROSS JOIN
:
with
week_data as (select * from unnest(array['2024W01', '2024W02']) l(week_ref)),
day_data as (select * from unnest(array['SUNDAY','MONDAY']) r(day_ref))
select * from week_data cross join day_data;
Finally we are ready to combine both CROSS JOIN
and UNNEST
CROSS JOIN UNNEST
The operation I'm going to perform is named UNPIVOT
in Oracle XE
, see here at the same time BigQuery
has both UNPIVOT
and UNNEST
, you may check here
Let's have some test data :
with
in_data as (select '2024W01' as week_ref, 'SUNDAY' as day0, 0 as day_order0, 'MONDAY' as day1, 1 as day_order1)
select * from in_data;
Firstly, let's generate the result with UNION ALL
:
with
in_data as (select '2024W01' as week_ref, 'SUNDAY' as day0, 0 as day_order0, 'MONDAY' as day1, 1 as day_order1)
select week_ref, day0 as day, day_order0 as day_order from in_data
union all
select week_ref, day1 as day, day_order1 as day_order from in_data;
The query above will scan the in_data
as many times as many subqueries you have, in this example we have 2 subqueries.
and finally the performance optimization with CROSS JOIN UNNEST
:
with
in_data as (select '2024W01' as week_ref, 'SUNDAY' as day0, 0 as day_order0, 'MONDAY' as day1, 1 as day_order1)
select week_ref, day, day_order
from in_data
cross join unnest(array[day0, day1], array[day_order0, day_order1]) t(day, day_order);
Using UNPIVOT
increases performance of your data analytics requests and saves the energy.
Enjoy !
Top comments (0)