Replacing Saas ETL with Python dlt: A painless experience for Yummy.eu

#saasetl #dataengineering #python

Yummy is a Lean-ops meal-kit company streamlines the entire food preparation process for customers in emerging markets by providing personalized recipes,
nutritional guidance, and even shopping services. Their innovative approach ensures a hassle-free, nutritionally optimized meal experience,
making daily cooking convenient and enjoyable.

Yummy is a food box business. At the intersection of gastronomy and logistics, this market is very competitive.
To make it in this market, Yummy needs to be fast and informed in their operations.

Pipelines are not yet a commodity.

At Yummy, efficiency and timeliness are paramount. Initially, Martin, Yummy’s CTO, chose to purchase data pipelining tools for their operational and analytical
needs, aiming to maximize time efficiency. However, the real-world performance of these purchased solutions did not meet expectations, which
led to a reassessment of their approach.

What’s important: Velocity, Reliability, Speed, time. Money is secondary.

Martin was initially satisfied with the ease of setup provided by the SaaS services.

The tipping point came when an update to Yummy’s database introduced a new log table, leading to unexpectedly high fees due to the vendor’s default settings that automatically replicated new tables fully on every refresh. This situation highlighted the need for greater control over data management processes and prompted a shift towards more transparent and cost-effective solutions.

💡 Proactive management of data pipeline settings is essential.
Automatic replication of new tables, while common, often leads to increased costs without adding value, especially if those tables are not immediately needed.
Understanding and adjusting these settings can lead to significant cost savings and more efficient data use.

10x faster, 182x cheaper with dlt + async + modal

Motivated to find a solution that balanced cost with performance, Martin explored using dlt, a tool known for its simplicity in building data pipelines.
By combining dlt with asynchronous operations and using Modal for managed execution, the improvements were substantial:

Data processing speed increased tenfold.
Cost reduced by 182 times compared to the traditional SaaS tool.
The new system supports extracting data once and writing to multiple destinations without additional costs.

For a peek into on how Martin implemented this solution, please see Martin's async Postgres source on GitHub..

Taking back control with open source has never been easier

Taking control of your data stack is more accessible than ever with the broad array of open-source tools available. SQL copy pipelines, often seen as a basic utility in data management, do not generally differ significantly between platforms. They perform similar transformations and schema management, making them a commodity available at minimal cost.

SQL to SQL copy pipelines are widespread, yet many service providers charge exorbitant fees for these simple tasks. In contrast, these pipelines can often be set up and run at a fraction of the cost—sometimes just the price of a few coffees.

At dltHub, we advocate for leveraging straightforward, freely available resources to regain control over your data processes and budget effectively.

Setting up a SQL pipeline can take just a few minutes with the right tools. Explore these resources to enhance your data operations:

30+ SQL database sources
Martin’s async PostgreSQL source
Arrow + connectorx for up to 30x faster data transfers

For additional support or to connect with fellow data professionals, join our community.

DEV Community

Replacing Saas ETL with Python dlt: A painless experience for Yummy.eu

Pipelines are not yet a commodity.

What’s important: Velocity, Reliability, Speed, time. Money is secondary.

10x faster, 182x cheaper with dlt + async + modal

Taking back control with open source has never been easier

Top comments (0)

Read next

Guide to 24 Essential Open Source Projects from Package Managers to AI apps

Automated crypto price tracking using GMAIL and Python

YOLOv11: A New Breakthrough in Document Layout Analysis

Advent of Code 2024 - Day 2: Red-Nosed Reports