WHAT
An OLAP cube is an aggregation of a fact metric on a number of dimensions. CUBE refers to a multi-dimensional dataset.
The advantage of OLAP cubes is that they are easy to communicate to business users.
The data should be stored at the finest - atomic - level of the data.
OPERATIONS
ROLL-UP
Aggregate or combine values and reduce number of rows or columns.
Example: Sum up sales of each city by country.
DRILL-DOWN
Decompose values and increase number of rows or columns
Example: Decompose sales of a city into sales of districts.
SLICING
Reducing N dimensions to N-1 dimensions by restricting on dimension to a single value.
Example: Taking out all data for one month.
DICING
Same dimensions but computing a sub-cube by restricting some of the values of the dimensions.
QUERY OPTIMIZATION
Business users want to slice, dice, roll-up and drill-down. Every combination will most likely go through all the facts table. This is not very performant.
We can use GROUP BY CUBE(groupme, andme, andmetoo). This will make 1 pass through the facts table and aggregate all possible combinations. This output is most of the time enough to answer all aggregations from business users, without processing the whole facts table. It also has the advantage that it is faster and easier to read than to write several queries and use UNION three times to get them together.
SELECT dimDate.month, dimStore.country, sum(sales_amount) as revenue
FROM factSales
JOIN dimDate on (dimDate.date_key = factSales.date_key)
JOIN dimStore on (dimStore.store_key = factSales.store_key)
GROUP by cube (dimDate.month, dimStore.country);
Example output:
total, total by month, total by country, total by month and country
TECHNOLOGY
MOLAP
Pre-aggregate the OLAP cubes and save them on a special purpose Non-relational database.
ROLAP
Compute the OLAP cubes on the fly from existing relational databases where the dimensional model resides.
Code examples in Python notebooks
Sketchnote
To have all the above information in one view, I made a sketchnote.
Top comments (0)