Parquet files are a powerhouse for storing large, columnar datasets in big data workflows. Traditionally, querying them requires bigdata frameworks like Apache Spark or Hadoop, which bring hefty setup and resource demands. But what if you could skip the heavy lifting and build a lightweight RESTful API instead? Enter DuckDB — a nimble embedded SQL engine that queries Parquet files directly. In this article, I’ll show you how to pair it with *Quarkus *(a cloud-native Java framework) and *Kotlin *(for its clean, safe syntax) to expose Parquet data over HTTP. We’ll use Jdbi’s bindMap to bind REST request bodies, Quarkus’s config for query templates, and add periodic loading of Parquet files from S3. Let’s craft this efficient API!
Why This Stack?
Here’s a quick look at our tools:
- Parquet: A columnar file format optimized for analytics, widely used in big data ecosystems
- DuckDB: An embedded SQL engine that queries Parquet files directly—no servers or clusters needed
- Quarkus: A fast, lightweight framework built for modern apps, with excellent REST support
- Kotlin: A concise, null-safe language that makes coding with Quarkus a joy.
Our goal
Query Parquet data with DuckDB, serve it via Quarkus REST endpoints, and manage query templates with Quarkus’s config—all in Kotlin, without the baggage of traditional big data frameworks.
The Big Picture
The flow is sleek:
- Quarkus periodically downloads Parquet files from S3 based on a configurable path (To add this in next article)
- DuckDB queries these files directly using SQL that configed in yaml.
- Quarkus serves the results as REST endpoints, with Jdbi binding JSON request bodies to query templates from the config.
This delivers a dynamic, lightweight API for Parquet data over HTTP.
Let’s Build It
Step 1: Set Up with Quarkus Initializr
Head to code.quarkus.io to scaffold your project:
Artifact: parquet-rest
Build Tool: Maven
Java Version: 21
Extensions:
- REST
- REST Jackson
- Kotlin (quarkus-kotlin)
Generate, download, and unzip
Add DuckDB and Jdbi to pom.xml:
<!-- DuckDB JDBC Driver -->
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>1.2.0</version>
</dependency>
<!-- JDBI Core -->
<dependency>
<groupId>org.jdbi</groupId>
<artifactId>jdbi3-kotlin</artifactId>
<version>3.48.0</version>
</dependency>
Step 2: ConfigMapping for Query Templates
In this demo, we use userdata.parquet
from timestored and put the file to src/test
Define query templates in application.properties
Quarkus's config mappings is possible to group multiple configuration properties in a single interface that share the same prefix.
rest.queries.get-all=SELECT * FROM 'src/test/userdata.parquet'
rest.queries.findUsersByLastName=SELECT * FROM 'src/test/userdata.parquet' where last_name = :lastName
We can Load them with a config class
import io.smallrye.config.ConfigMapping
import io.smallrye.config.WithName
@ConfigMapping(prefix = "rest")
interface QueryConfig {
@get:WithName("queries")
val queries: Map<String, String>
}
Here, @get:WithName("queries")
turns the queries()
method into a field-like queries
property in Kotlin, making access clean and intuitive: config.queries["get-all"]
instead of config.queries()
. This is a standout feature of Quarkus’s ConfigMapping with Kotlin!
Step 3: Querying Parquet with DuckDB and Jdbi
Create a service to query the Parquet file:
import jakarta.enterprise.context.ApplicationScoped
import org.jdbi.v3.core.Jdbi
import org.jdbi.v3.core.kotlin.withHandleUnchecked
@ApplicationScoped
class ParquetService(
val config: QueryConfig,
) {
private val jdbi = Jdbi.create("jdbc:duckdb:")
fun executeQuery(
queryName: String,
parameters: Map<String, Any>,
): List<Map<String, Any>> {
val query =
config.queries[queryName]
?: throw IllegalArgumentException("Query with name '$queryName' not found in configuration")
return jdbi.withHandleUnchecked { handle ->
handle
.createQuery(query)
.bindMap(parameters)
.mapToMap()
.list()
}
}
}
Here, withHandleUnchecked
simplifies the code by throwing unchecked exceptions (like SQLException
) directly, avoiding the need to declare a checked exception type. It’s a Kotlin-friendly tweak from jdbi3-kotlin
, keeping things concise.
Step 4: REST Endpoint
Expose a POST endpoint:
import jakarta.ws.rs.Consumes
import jakarta.ws.rs.POST
import jakarta.ws.rs.Path
import jakarta.ws.rs.Produces
import jakarta.ws.rs.core.MediaType
import org.jboss.resteasy.reactive.RestPath
@Path("/query")
class QueryResource(
val parquetService: ParquetService,
) {
@POST
@Path("/{queryName}")
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
fun queryWithParams(
@RestPath queryName: String,
body: Map<String, Any>,
) = parquetService.executeQuery(queryName, body)
}
Step 5: Testing with RestAssured-Kotlin
Add a test in 'src/test/kotlin'
@QuarkusTest
class QueryResourceTest {
@Test
fun testQueryEndpoint() {
Given {
body("{}")
header("Content-Type", "application/json")
} When {
post("/query/get-all")
} Then {
statusCode(200)
}
}
}
Here, restassured kotlin-extensions DSL made RESTful test more readable in Given-When-Then tyle
Testing It Out
Run the app:
./mvnw quarkus:dev
Test with http test
### GET All
POST http://localhost:8080/query/get-all
Content-Type: application/json
{}
### filter-by-column
POST http://localhost:8080/query/findUsersByLastName
Content-Type: application/json
{
"lastName": "Mason"
}
Why This Shines
This setup skips big data frameworks for a lightweight win. DuckDB queries Parquet natively, Quarkus handles REST and scheduling, and Jdbi3-Kotlin’s withHandleUnchecked keeps queries sleek. ConfigMapping with @get:WithName makes config.queries a joy, and RestAssured ensures it all works reliably.
Wrapping Up
We’ve built a RESTful API for Parquet files with DuckDB, Quarkus, and Kotlin, dodging heavy frameworks. With Jdbi3-Kotlin, ConfigMapping’s field-like access, and RestAssured tests, it’s robust and simple. Try it with your own data—tweak the queries and share your thoughts in the comments!
The full project can be found on my github.
Top comments (0)