DEV Community

# dataengineering

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
From Data to Decisions: How Machine Learning Works in 2025

From Data to Decisions: How Machine Learning Works in 2025

1
Comments
3 min read
Mastering Dynamic Allocation in Apache Spark: A Practical Guide with Real-World Insights

Mastering Dynamic Allocation in Apache Spark: A Practical Guide with Real-World Insights

Comments
3 min read
Designing robust and scalable relational databases: A series of best practices.

Designing robust and scalable relational databases: A series of best practices.

9
Comments 2
17 min read
Deep Dive into Dremio's File-based Auto Ingestion into Apache Iceberg Tables

Deep Dive into Dremio's File-based Auto Ingestion into Apache Iceberg Tables

Comments
13 min read
What are the major advantages of a cloud warehouse solution over an on-premises data warehouse solution?

What are the major advantages of a cloud warehouse solution over an on-premises data warehouse solution?

Comments
5 min read
Databricks vs. Hadoop: Which Platform is Best for Predictive Analytics?

Databricks vs. Hadoop: Which Platform is Best for Predictive Analytics?

Comments
7 min read
Talend vs. Apache Kafka: Which Data Tool Drives Better Business Insights?

Talend vs. Apache Kafka: Which Data Tool Drives Better Business Insights?

Comments
6 min read
OLAP (Online Analytical Processing)

OLAP (Online Analytical Processing)

9
Comments
3 min read
LightningChart Python 1.0

LightningChart Python 1.0

Comments
1 min read
Data Pipeline Filters 101: Choosing Between Static and Dynamic Approaches

Data Pipeline Filters 101: Choosing Between Static and Dynamic Approaches

Comments
1 min read
Ensuring Data Quality: Best Practices and Automation

Ensuring Data Quality: Best Practices and Automation

Comments
6 min read
Data Science Simplified: Tips for Aspiring Data Scientists in 2025

Data Science Simplified: Tips for Aspiring Data Scientists in 2025

1
Comments
4 min read
Dremio, Apache Iceberg and their role in AI-Ready Data

Dremio, Apache Iceberg and their role in AI-Ready Data

Comments
7 min read
SAP S/4HANA Cloud

SAP S/4HANA Cloud

Comments
2 min read
Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

Comments
12 min read
Understanding Star Schema vs. Snowflake Schema

Understanding Star Schema vs. Snowflake Schema

Comments
1 min read
One Off to One Data Platform: The Unscalable Data Platform [Part 1]

One Off to One Data Platform: The Unscalable Data Platform [Part 1]

9
Comments
3 min read
Mastering Workflow Automation with Apache Airflow for Data Engineering

Mastering Workflow Automation with Apache Airflow for Data Engineering

Comments
6 min read
Data Modeling - Entities and Events

Data Modeling - Entities and Events

Comments
6 min read
My journey learning Apache Spark

My journey learning Apache Spark

Comments
2 min read
SQL "SELECT INTO" vs "INSERT INTO SELECT" statements.

SQL "SELECT INTO" vs "INSERT INTO SELECT" statements.

Comments
1 min read
My Journey into Data AI and Machine Learning

My Journey into Data AI and Machine Learning

Comments
1 min read
The Ultimate Data Engineering Roadmap: From Beginner to Pro

The Ultimate Data Engineering Roadmap: From Beginner to Pro

6
Comments 1
8 min read
Intro to SQL using Apache Iceberg and Dremio

Intro to SQL using Apache Iceberg and Dremio

2
Comments
22 min read
Why Data Security is Broken and How to Fix it?

Why Data Security is Broken and How to Fix it?

1
Comments
5 min read
From ETL and ELT to Reverse ETL

From ETL and ELT to Reverse ETL

Comments
4 min read
*Mastering Informatica Intelligent Cloud Services (IICS) for Cloud Data Integration*

*Mastering Informatica Intelligent Cloud Services (IICS) for Cloud Data Integration*

1
Comments
3 min read
The Future of Agentic Systems Podcast 1:42:26

The Future of Agentic Systems Podcast

10
Comments 1
1 min read
What is Data Engineering?

What is Data Engineering?

Comments
1 min read
End-to-End ETL and Sales Dashboard on WWI dataset in Microsoft Fabric

End-to-End ETL and Sales Dashboard on WWI dataset in Microsoft Fabric

Comments
7 min read
Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

Comments
19 min read
All About Parquet Part 09 - Parquet in Data Lake Architectures

All About Parquet Part 09 - Parquet in Data Lake Architectures

Comments
5 min read
All About Parquet Part 02 - Parquet's Columnar Storage Model

All About Parquet Part 02 - Parquet's Columnar Storage Model

Comments
4 min read
All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

All About Parquet Part 06 - Encoding in Parquet | Optimizing for Storage

Comments
6 min read
Data Analysis: The Unsung Hero of Modern Business

Data Analysis: The Unsung Hero of Modern Business

Comments
2 min read
Analyzing Airbnb Listings in Chicago: A Power BI Dashboard Project

Analyzing Airbnb Listings in Chicago: A Power BI Dashboard Project

1
Comments
4 min read
5 Best ETL Tools: A Comprehensive Comparison Guide

5 Best ETL Tools: A Comprehensive Comparison Guide

1
Comments
3 min read
Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

1
Comments
15 min read
Why Apache Spark RDD is immutable?

Why Apache Spark RDD is immutable?

Comments
3 min read
Data Engineering in Observability: The Backbone of Modern Monitoring

Data Engineering in Observability: The Backbone of Modern Monitoring

1
Comments
5 min read
Análise de dados de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka

Análise de dados de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka

1
Comments
8 min read
Oracle to Snowflake Migration: Steps, Challenges & Best Practices

Oracle to Snowflake Migration: Steps, Challenges & Best Practices

1
Comments
3 min read
Data Engineering in 2024: Innovations and Trends Shaping the Future

Data Engineering in 2024: Innovations and Trends Shaping the Future

5
Comments 2
13 min read
AWS DATA ENGINEER - 101

AWS DATA ENGINEER - 101

3
Comments
2 min read
The Journey From a CSV File to Apache Hive Table

The Journey From a CSV File to Apache Hive Table

6
Comments
6 min read
Achieving Clean and Scalable PySpark Code: A Guide to Avoiding Redundancy

Achieving Clean and Scalable PySpark Code: A Guide to Avoiding Redundancy

Comments
5 min read
CapĂ­tulo 2 - Modelos de Datos y Lenguajes de Consulta

CapĂ­tulo 2 - Modelos de Datos y Lenguajes de Consulta

2
Comments
7 min read
All About Parquet Part 01 - An Introduction

All About Parquet Part 01 - An Introduction

1
Comments
4 min read
All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

3
Comments
6 min read
All About Parquet Part 03 - Parquet File Structure | Pages, Row Groups, and Columns

All About Parquet Part 03 - Parquet File Structure | Pages, Row Groups, and Columns

2
Comments
5 min read
All About Parquet Part 07 - Metadata in Parquet | Improving Data Efficiency

All About Parquet Part 07 - Metadata in Parquet | Improving Data Efficiency

1
Comments
5 min read
All About Parquet Part 04 - Schema Evolution in Parquet

All About Parquet Part 04 - Schema Evolution in Parquet

2
Comments
5 min read
All About Parquet Part 05 - Compression Techniques in Parquet

All About Parquet Part 05 - Compression Techniques in Parquet

2
Comments
5 min read
All About Parquet Part 08 - Reading and Writing Parquet Files in Python

All About Parquet Part 08 - Reading and Writing Parquet Files in Python

3
Comments
5 min read
From a Unified Bronze Layer to Multiple Silver Layers: Streamlining Data Transformation in Databricks Unity Catalog

From a Unified Bronze Layer to Multiple Silver Layers: Streamlining Data Transformation in Databricks Unity Catalog

2
Comments
5 min read
Clear Link Between DevSecOps and Data Engineering

Clear Link Between DevSecOps and Data Engineering

Comments
1 min read
Still Using SQL, Python, & Excel for Data Deduplication? Here's Why You Need Better Tools.

Still Using SQL, Python, & Excel for Data Deduplication? Here's Why You Need Better Tools.

5
Comments
4 min read
Building a Big Data Playground Sandbox for Learning

Building a Big Data Playground Sandbox for Learning

5
Comments
5 min read
Capture Browser XHR/Fetch API Response Automatically into JSON Files

Capture Browser XHR/Fetch API Response Automatically into JSON Files

Comments
1 min read
The True Cost of Poor Data Quality: Why It Matters and How to Improve It

The True Cost of Poor Data Quality: Why It Matters and How to Improve It

3
Comments
6 min read
loading...