DEV Community

Cover image for Study Notes 4.3.2 - Testing and Documenting the Project
Pizofreude
Pizofreude

Posted on

Study Notes 4.3.2 - Testing and Documenting the Project

1. Introduction to Data Testing

  • Objective: Ensure data delivered to end-users is accurate and prevent building models on incorrect data.
  • DVT Tests: Data Validation Tests (DVT) are assumptions about data. They are SQL-based checks that validate data integrity.
    • Purpose: Identify errors quickly and stop execution if assumptions fail.
    • Implementation: Defined in YAML files and compiled into SQL code.

2. Types of DVT Tests

  • Four Default Tests:
    1. Unique Test: Ensures a field's uniqueness within a data model.
    2. Not Null Test: Ensures a field does not contain null values.
    3. Accepted Values Test: Validates that a field contains only specified values.
    4. Foreign Key Test: Ensures a field has a valid relationship with another table.
  • Example:
    • Payment Type Description: Must be one of the values 1, 2, 3, 4, or 5.
    • Pickup Location: Must relate to a valid taxi zone in the ref_taxi_lookup table.
    • Trip ID: Must be unique and not null (primary key).

3. Compiling and Running Tests

  • YAML File: Contains test definitions.
  • SQL Compilation: Tests are translated into SQL queries.
    • Example: SELECT * FROM production.my_schema.yellow_trip_data WHERE trip_id IS NULL.
  • Test Execution: If no values are returned, the test passes. If values are returned, the test fails or issues a warning.

4. Macros in DVT

  • Cross-Database Macros: Abstract SQL flavor differences across databases (e.g., BigQuery vs. PostgreSQL).
    • Example: DATE_TRUNC and SAFE_CAST macros adapt to the underlying database.
  • Usage: Simplifies SQL code and ensures compatibility across databases.

5. Code Generation with Kosan Package

  • Purpose: Automates the generation of YAML files and base models.
  • Features:
    • Generate Source: Creates YAML files for database sources.
    • Generate Base Model: Creates CTE-based models.
    • Generate Model YAML: Automates YAML file creation for models.
  • Usage:
    • Add the Kosan package to the project.
    • Use functions like generate_model_yaml to create YAML files for multiple models.

6. Adding Tests to Models

  • Steps:
    1. Define unique and not null tests for primary keys (e.g., trip_id).
    2. Add relationship tests for foreign keys (e.g., pickup_location_id must relate to ref_taxi_lookup).
    3. Add accepted values tests for specific fields (e.g., payment_type must be 1, 2, 3, 4, or 5).
  • Severity Levels:
    • Warning: Issues a warning but does not stop execution.
    • Error: Stops execution if the test fails.
    • Threshold: Define thresholds for warnings (e.g., up to 10 records can fail).

7. Building and Testing the Project

  • Process:
    1. Build the project, starting with sources and seeds.
    2. Execute tests for each model as it is built.
    3. Parallel execution: Independent models are built simultaneously.
  • Test Results:
    • Warnings or errors are displayed for failed tests.
    • Example: A warning is issued if payment_type contains values outside the accepted range.

8. Documentation in DVT

  • Purpose: Create comprehensive documentation for the project.
  • Sources of Documentation:
    1. YAML Files: Descriptions of models and columns.
    2. Code: Lineage and dependencies extracted from the code.
    3. Information Schema: Metadata like table size.
  • Generating Documentation:
    • Use dbt docs generate to create documentation files.
    • Use dbt docs serve to host documentation locally.
  • Documentation Features:
    • Lineage diagrams.
    • Test results and descriptions.
    • Compiled SQL code and dependencies.

9. Hosting Documentation

  • Locally: Use dbt docs serve to host documentation on localhost.
  • Cloud: Documentation can be hosted on dbt Cloud.
  • Example: Documentation includes model descriptions, column details, and test results.

10. Additional Tools and Packages

  • dbt Utils: Provides additional utilities for testing and documentation.
  • dbt Expectations: Based on Great Expectations, offers advanced testing capabilities.

11. Key Takeaways

  • Testing: Essential for ensuring data accuracy and integrity.
  • Automation: Use tools like Kosan to automate YAML and model generation.
  • Documentation: Critical for maintaining transparency and understanding of the project.
  • Flexibility: DVT macros and packages allow for cross-database compatibility and advanced testing.

Example YAML Snippet for Tests

version: 2

models:
  - name: yellow_trip_data
    columns:
      - name: trip_id
        tests:
          - unique
          - not_null
      - name: pickup_location_id
        tests:
          - relationships:
              to: ref_taxi_lookup
              field: location_id
      - name: payment_type
        tests:
          - accepted_values:
              values: [1, 2, 3, 4, 5]

Enter fullscreen mode Exit fullscreen mode

Commands for Documentation

# Generate documentation
dbt docs generate

# Serve documentation locally
dbt docs serve

Enter fullscreen mode Exit fullscreen mode

By following these steps and utilizing the tools discussed, you can ensure your data models are robust, well-tested, and thoroughly documented.

Top comments (0)