DEV Community

Cover image for Study Notes 4.1.1 - Analytics Engineering Basics
Pizofreude
Pizofreude

Posted on

Study Notes 4.1.1 - Analytics Engineering Basics

1. Introduction to Analytics Engineering

  • Analytics Engineering is a relatively new role that bridges the gap between Data Engineers and Data Analysts.
  • The evolution of cloud data warehouses (e.g., BigQuery, Snowflake, Redshift) has significantly reduced storage and computation costs.
  • Tools like Fivetran and Stitch have simplified the ETL (Extract, Transform, Load) process.
  • SQL-first BI tools like Looker and Mode have introduced version control into data workflows, improving self-service analytics and data governance.
  • The rise of these tools has changed how data teams operate and how stakeholders consume data, creating a need for the Analytics Engineer role.

2. Traditional Data Team Roles & the Role of Analytics Engineers

Traditional Data Roles

  1. Data Engineer
    • Prepares and maintains data infrastructure.
    • Ensures data pipelines are functional and efficient.
    • Focuses on the technical aspect of data storage and processing.
  2. Data Analyst
    • Works with data stored in the infrastructure to generate business insights.
    • Primarily uses SQL, BI tools, and dashboards to answer business questions.
  3. Data Scientist (not explicitly discussed but relevant)
    • Uses statistical and machine learning techniques to analyze data.
    • Writes code but is not necessarily a software engineer.

Challenges & the Emergence of Analytics Engineers

  • Data Scientists & Analysts are writing more code, but they are not trained as software engineers.
  • Data Engineers are skilled in software engineering but lack knowledge about business-specific data usage.
  • Analytics Engineers fill this gap by applying software engineering best practices (e.g., modularity, version control, CI/CD) to the work of Data Analysts and Scientists.

roles in data team

3. Tools & Responsibilities of an Analytics Engineer

Core Responsibilities

  • Integrating software engineering best practices into data transformation workflows.
  • Managing data pipelines efficiently.
  • Ensuring data quality, reliability, and accessibility.
  • Collaborating closely with Data Engineers, Analysts, and Business Teams.

Common Tools Used by Analytics Engineers

  1. Data Loading Tools: Fivetran, Stitch (ETL tools).
  2. Data Storage Management: Cloud Data Warehouses (BigQuery, Snowflake, Redshift).
  3. Data Transformation & Modeling: dbt (Data Build Tool), Dataform.
  4. Data Visualization & BI Tools: Google Data Studio, Looker, Mode.

tooling


4. ETL vs. ELT: Key Differences & Use Cases

ETL (Extract, Transform, Load)

  • Process: Data is extracted from sources, transformed before loading into a warehouse.
  • Advantages:
    • Ensures clean and structured data before storage.
    • More compliance-friendly (suitable for regulated industries).
  • Disadvantages:
    • Slower implementation due to upfront transformation.
    • Requires defined schema before data is loaded.

ELT (Extract, Load, Transform)

  • Process: Data is extracted and directly loaded into the warehouse before transformation.
  • Advantages:
    • Faster and more flexible (data is readily available for analysis).
    • Leverages cheap cloud storage and computation.
  • Disadvantages:
    • Data might be less structured initially, requiring on-the-fly transformations.
    • Can lead to messy datasets if not managed properly.

ETL vs ELT


5. Data Modeling Concepts (Kimball’s Dimensional Modeling)

Goals of Dimensional Modeling

  • Make data understandable to business users.
  • Optimize for fast query performance.
  • Unlike 3rd Normal Form (3NF), redundancy is acceptable for better usability.

Fact Tables vs. Dimension Tables

1. Fact Tables

  • Contain measurements, metrics, or facts related to business processes.
  • Represent actions or events (e.g., sales, orders).
  • Can be thought of as verbs in data modeling.

2. Dimension Tables

  • Provide context to fact tables.
  • Contain descriptive attributes (e.g., customer, product, location).
  • Can be thought of as nouns in data modeling.

Star Schema

  • A widely used data modeling approach where a fact table is surrounded by multiple dimension tables, forming a star-like structure.

6. Data Processing Analogy: The Kitchen Model

Kimball’s book presents an analogy comparing data processing to a restaurant workflow:

The Kitchen Model

  1. Staging Area (Raw Data) → Storage Room
    • Similar to raw ingredients in a restaurant.
    • Not accessible to everyone, only to data engineers who know how to handle raw data.
  2. Processing Area (Data Models) → Kitchen
    • Like a kitchen where raw ingredients are turned into prepared meals.
    • Handled by Analytics Engineers using data transformation tools (e.g., dbt, SQL transformations).
    • Focus on efficiency, standardization, and consistency.
  3. Presentation Area (Final Data) → Dining Hall
    • The final stage where data is presented to business users.
    • Similar to serving prepared food to restaurant customers.
    • Data is structured, visualized, and easy to interpret using BI tools.

7. Summary & Key Takeaways

  • Analytics Engineers bridge the gap between Data Engineers and Analysts by incorporating software engineering practices into data analytics.
  • They use modern tools like dbt, Fivetran, Snowflake, and BI tools to model, transform, and present data.
  • Understanding ETL vs. ELT is crucial for deciding how data should be processed efficiently.
  • Kimball’s Dimensional Modeling helps make data user-friendly and performant, prioritizing usability over strict normalization.
  • The Kitchen Model is a helpful analogy to understand the stages of data processing, from raw data to business-ready insights.

Top comments (0)