Overview
Course Duration and Structure
- Duration: 6 modules plus 2 workshops.
- Format: Weekly modules covering key data engineering topics.
- Interactivity: Q&A sessions, Slack discussions, and GitHub contributions.
Key Resources
- Slack: Main platform for discussions.
- GitHub Repository: Contains all course materials.
- Telegram Channel: For announcements and updates.
- Environment Setup: Use GitHub Codespaces or cloud virtual machines for ease of starting up as compared to local installations with some hurdles especially when using Windows. Linux FTW! OpenSUSE Tumbleweed 😉
Key Topics and Modules
Week 1: Environment Setup and Basics
- Key Tools: Docker, Terraform, GitHub Codespaces.
- Focus: Preparing the environment for the course.
- Skills Required: Basic command-line knowledge, Docker commands, Python basics.
Week 2: Workflow Orchestration
- Tool: Kestra (orchestration tool).
-
Content:
- Simplify scripts created in Week 1.
- Convert CSV files to Parquet format and upload to Google Cloud Storage.
-
Notes:
- The NYC Taxi and Limousine Commission dataset was used as an example.
Week 3: Data Warehousing
- Tool: Google BigQuery.
- Focus: Storing and querying large datasets.
Week 4: DBT (Data Build Tool)
-
Content:
- Transform data for analysis.
- Build visualizations and dashboards.
Week 5: Batch Processing
- Tool: Apache Spark.
-
Focus:
- Batch processing similar to DBT.
- Provides finer control over data pipelines.
Week 6: Streaming and Real-Time Data Processing
- Tools: Kafka, RisingWave (open-source SQL streaming tool).
-
Focus:
- Stream processing using SQL.
- Introduction to stream-based architectures.
Workshops
-
Workflow Orchestration:
- Practical session to consolidate Week 2 content.
-
Streaming Data with SQL:
- Hands-on workshop focusing on real-time data pipelines.
Final Project
- Objective: Create a comprehensive data engineering project. This is the course requirement for graduation with certificate.
-
Guidelines:
- Use any tools and concepts covered in the course.
- Option to partner with nonprofits or work independently.
- Focus on practical, real-world data use cases.
-
Submission:
- Homework files available in the GitHub cohort repository.
- Submit projects via a new automated platform (replacing Google Forms): Course Management Platform. Note: If Sign up using GitHub Auth failed (e.g. Server Error 500), user is adviced to use Slack Auth or GAuth instead. This is a known bug and still work-in-progress.
Expectations and Requirements
-
Prerequisites:
- Familiarity with Python and basic programming concepts.
- Command-line proficiency.
- Time Commitment: Flexible; follow your own pace.
- Certificates: Awarded upon successful completion of the final project. Homework submission counts toward internal ranking system as motivational instrument for participants.
Additional Tips
-
GitHub Contributions:
- Star the course repository to help it trend.
- Solve some ticket on Github issues as open-source contributions.
- Engage with the community by sharing insights or asking questions.
-
Slack:
- Check the FAQ document before posting queries.
- Use relevant channels to interact with peers, or directly ask @ZoomcampQABot in the #course-data-engineering channel before reaching out to instructors as final resort.
-
Environment:
- Codespaces offers a simple setup with pre-installed tools like Docker and Python.
- Cloud virtual machines provide flexibility for advanced setups.
Career Insights and Recommendations
-
Job Outlook:
- Despite tech layoffs, demand for data engineers remains strong.
- Skills in platforms like GCP, AWS, and Azure are valuable.
-
Certifications:
- Beneficial, especially for early-career professionals and consultants.
-
Applications:
- Data engineering techniques are foundational for ML and analytics roles.
Why This Course is Free
- Motivation: Sharing knowledge with the community.
- Support: Funded by sponsors like Kestra, dlt, Mage, DTHub, and RisingWave.
-
Community Contribution:
- Participants can support the course by sharing it, contributing feedback, or donating through training budgets towards DataTalksClub.
Miscellaneous Notes
- Consider Learning in Public to stay motivate with extra points for ranking.
- Office Hours: Scheduled for specific topics like Kestra and project guidance.
- FAQ Document: Comprehensive guide available for common queries.
-
Past Student Contributions:
- Many alumni have shared tools and insights to improve the course.
-
Data Architect Path:
- Consider learning about Kimball methodologies.
By following this structured approach, we can maximize our learning experience in the Data Engineering Zoomcamp 2025 Cohort.
Good luck everyone!
Top comments (0)