DEV Community

Cover image for Study Note 2.2.5: Orchestrate dbt Models with Postgres in Kestra
Pizofreude
Pizofreude

Posted on

Study Note 2.2.5: Orchestrate dbt Models with Postgres in Kestra

Overview

This introduces how to use dbt (data build tool) with Kestra to perform data transformations post-extraction and loading into a database.

Key Concepts

  1. DBT (Data Build Tool):
    • A tool used for transforming data in a database.
    • It allows for transformations using SQL queries.
  2. Kestra:
    • A workflow orchestration tool used to automate data workflows.
    • In this context, it automates the process of using dbt to transform data.
  3. Postgres Database:
    • The database where the extracted data is loaded and stored.
    • The example uses trip data (green and yellow) for the first three months of 2019.

Workflow Steps

  1. Data Extraction and Loading:
    • Extract data from GitHub.
    • Load the data directly into a table in the Postgres database.
  2. DBT Integration with Kestra:
    • Use Kestra to automate dbt commands for data transformation.
    • Clone the repository containing the dbt project and necessary files.
    • Set up the Postgres database connection within Kestra to access the data.
  3. DBT Project Setup:
    • Use a Git repository containing the dbt project files.
    • The repository includes SQL commands and configurations for the dbt tasks.
  4. Executing the Workflow:
    • Ensure the Postgres database and environment variables are correctly set up.
    • Execute the workflow in Kestra to start the dbt build process.
    • Monitor as dbt creates new tables and performs transformations.
  5. Results and Analysis:
    • View the transformed data in PG Admin.
    • Observe how the data has become more structured and useful compared to the raw CSV files.

Key Takeaways

  • DBT Workflow in Kestra: This setup allows for an automated approach to transforming data, making it easier to manage and analyze.
  • Enhanced Data Usability: Transformed data is more organized and meaningful, aiding in better analysis and insights.
  • Future Learning: The course will cover more on visualization and using dbt with BigQuery in subsequent lessons.

Practical Example

Workflow Diagram:

  • Extract: Data from GitHub.
  • Load: Directly into Postgres tables.
  • Transform: Using dbt within Kestra.
  • Analyze: View transformed data in PG Admin.

DBT Commands in Kestra:

  • Sync files from the repository.
  • Execute dbt build to create and transform tables.

Database Tables:

  • Green trip data
  • Yellow trip data
  • Transformed tables with enriched information for analysis.

Conclusion

This session provided a glimpse into how dbt can be integrated with Kestra to automate data transformation processes in a Postgres database. This approach makes data more accessible and ready for analysis, streamlining the data engineering workflow. Future sessions will delve deeper into visualization techniques and using dbt with other data warehouses like BigQuery.

Top comments (0)