The snowflake schema is a type of database schema that organizes data into a centralized fact table surrounded by normalized dimensions. Unlike a star schema, where dimensions are typically denormalized into flat tables, the snowflake schema splits dimensions into related sub-dimensions, reducing data redundancy and improving storage efficiency.
Dimensions with hierarchies can be decomposed into a snowflake structure when you want to avoid joins to big dimension tables when you are using an aggregate of the fact table. For example, if you have brand information that you want to separate out from a product dimension table, you can create a brand snowflake that consists of a single row for each brand and that contains significantly fewer rows than the product dimension table. The following figure shows a snowflake structure for the brand and product line elements and the brand_agg aggregate table.
Snowflake schemas are especially useful in scenarios where certain attributes apply only to subsets of a dimension, leading to sparse data and inefficiencies in traditional denormalized structures.
Below are three practical use cases where the snowflake schema is applied, along with clear data models to demonstrate how it works.
Use Case 1: Large Customer Dimension
Scenario:
In businesses such as online marketing, there are two types of customers:
Anonymous Visitors: Identified only by cookie data, with minimal attributes.
Registered Customers: Have detailed information, including demographics, address, and payment history.
Storing these two types of entities in a single table results in inefficiencies as most attributes remain null for anonymous visitors.
Solution:
Using a snowflake schema:
The base Customer Dimension holds common attributes for both visitors and registered customers.
Separate sub-dimensions store specific attributes for Visitors and Registered Customers.
Data Model:
+---------------------+ +--------------------------+
| Customer (Base) | | Customer Details (Snow) |
+---------------------+ +--------------------------+
| Customer_ID (PK) |--------| Customer_ID (FK) |
| Customer_Type | | Demographics |
| Last_Visit_Date | | Address |
| Signup_Date | | Payment_History |
+---------------------+ +--------------------------+
|
|
V
+---------------------+
| Visitors (Snow) |
+---------------------+
| Visitor_ID (PK) |
| Cookie_ID |
| Visit_Frequency |
| Browsing_History |
+---------------------+
Use Case 2: Financial Product Dimension
Scenario:
In financial services, different product types (e.g., loans and insurance) have distinct attributes. Attempting to store all attributes in one dimension results in sparse data, as many attributes will not apply to all products.
Solution:
Using a snowflake schema:
The base Product Dimension contains attributes common to all products.
Separate sub-dimensions store specialized attributes for different product types.
Data Model:
+----------------------+ +------------------------------+
| Product (Base) | | Product Details (Snowflake) |
+----------------------+ +------------------------------+
| Product_ID (PK) |--------| Product_ID (FK) |
| Product_Type | | Specialized_Attribute_1 |
| Core_Attribute | | Specialized_Attribute_2 |
+----------------------+ +------------------------------+
|
|
V
+----------------------+
| Loan Products (Snow) |
+----------------------+
| Loan_ID (PK) |
| Interest_Rate |
| Loan_Term |
| Collateral_Type |
+----------------------+
|
|
V
+-----------------------+
| Insurance Products |
+-----------------------+
| Insurance_ID (PK) |
| Coverage_Type |
| Premium_Amount |
| Policy_Duration |
+-----------------------+
Use Case 3: Multi-Enterprise Calendar Dimension
Scenario:
In international businesses, calendars vary by country. For example:
The US might have specific fiscal quarters and national holidays.
The UK might have unique bank holidays.
India might have a calendar with festival-specific dates.
Storing all attributes in one table leads to complexity and inefficiency.
Solution:
Using a snowflake schema:
The base Calendar Dimension contains attributes common to all countries.
Separate sub-dimensions store country-specific calendar attributes.
Data Model:
+----------------------+ +----------------------------+
| Calendar (Base) | | US Calendar (Snowflake) |
+----------------------+ +----------------------------+
| Calendar_ID (PK) |--------| Calendar_ID (FK) |
| Date | | National_Holiday |
| Week_Number | | US_Fiscal_Quarter |
| Fiscal_Year | | US_Specific_Attribute |
+----------------------+ +----------------------------+
|
|
V
+----------------------------+
| UK Calendar (Snowflake) |
+----------------------------+
| Calendar_ID (FK) |
| National_Holiday |
| Bank_Holiday |
| UK_Fiscal_Quarter |
+----------------------------+
Conclusion
The snowflake schema is an efficient and organized approach to handling complex dimensions with sparse data. By breaking down dimensions into smaller, logical sub-dimensions, it:
- Reduces storage requirements.
- Improves query performance for specific attribute groups.
- Enhances clarity in schema design.
The examples above highlight how snowflake schemas can be applied to real-world scenarios, such as customer data, financial products, and multi-country calendars, ensuring data is both accessible and efficiently structured.
Top comments (0)