Brandy Chang

Posted on Feb 23

Turning Parquet File into a Queryable RESTful with DuckDB, Quarkus & Kotlin

#kotlinserverside #kotlin #parquet #duckdb

Parquet files are a powerhouse for storing large, columnar datasets in big data workflows. Traditionally, querying them requires bigdata frameworks like Apache Spark or Hadoop, which bring hefty setup and resource demands. But what if you could skip the heavy lifting and build a lightweight RESTful API instead? Enter DuckDB — a nimble embedded SQL engine that queries Parquet files directly. In this article, I’ll show you how to pair it with *Quarkus *(a cloud-native Java framework) and *Kotlin *(for its clean, safe syntax) to expose Parquet data over HTTP. We’ll use Jdbi’s bindMap to bind REST request bodies, Quarkus’s config for query templates, and add periodic loading of Parquet files from S3. Let’s craft this efficient API!

Why This Stack?

Here’s a quick look at our tools:

Parquet: A columnar file format optimized for analytics, widely used in big data ecosystems
DuckDB: An embedded SQL engine that queries Parquet files directly—no servers or clusters needed
Quarkus: A fast, lightweight framework built for modern apps, with excellent REST support
Kotlin: A concise, null-safe language that makes coding with Quarkus a joy.

Our goal

Query Parquet data with DuckDB, serve it via Quarkus REST endpoints, and manage query templates with Quarkus’s config—all in Kotlin, without the baggage of traditional big data frameworks.

The Big Picture

The flow is sleek:

Quarkus periodically downloads Parquet files from S3 based on a configurable path (To add this in next article)
DuckDB queries these files directly using SQL that configed in yaml.
Quarkus serves the results as REST endpoints, with Jdbi binding JSON request bodies to query templates from the config.

This delivers a dynamic, lightweight API for Parquet data over HTTP.

Let’s Build It

Step 1: Set Up with Quarkus Initializr

Head to code.quarkus.io to scaffold your project:

Artifact: parquet-rest
Build Tool: Maven
Java Version: 21
Extensions:

REST
REST Jackson
Kotlin (quarkus-kotlin)

Generate, download, and unzip

Add DuckDB and Jdbi to pom.xml:

        <!-- DuckDB JDBC Driver -->
        <dependency>
            <groupId>org.duckdb</groupId>
            <artifactId>duckdb_jdbc</artifactId>
            <version>1.2.0</version>
        </dependency>

        <!-- JDBI Core -->
        <dependency>
            <groupId>org.jdbi</groupId>
            <artifactId>jdbi3-kotlin</artifactId>
            <version>3.48.0</version>
        </dependency>

Step 2: ConfigMapping for Query Templates

In this demo, we use userdata.parquet from timestored and put the file to src/test

Define query templates in application.properties

Quarkus's config mappings is possible to group multiple configuration properties in a single interface that share the same prefix.

rest.queries.get-all=SELECT * FROM 'src/test/userdata.parquet'
rest.queries.findUsersByLastName=SELECT * FROM 'src/test/userdata.parquet' where last_name = :lastName

We can Load them with a config class

import io.smallrye.config.ConfigMapping
import io.smallrye.config.WithName

@ConfigMapping(prefix = "rest")
interface QueryConfig {
    @get:WithName("queries")
    val queries: Map<String, String>
}

Here, @get:WithName("queries") turns the queries() method into a field-like queries property in Kotlin, making access clean and intuitive: config.queries["get-all"] instead of config.queries(). This is a standout feature of Quarkus’s ConfigMapping with Kotlin!

Step 3: Querying Parquet with DuckDB and Jdbi

Create a service to query the Parquet file:

import jakarta.enterprise.context.ApplicationScoped
import org.jdbi.v3.core.Jdbi
import org.jdbi.v3.core.kotlin.withHandleUnchecked

@ApplicationScoped
class ParquetService(
    val config: QueryConfig,
) {
    private val jdbi = Jdbi.create("jdbc:duckdb:")

    fun executeQuery(
        queryName: String,
        parameters: Map<String, Any>,
    ): List<Map<String, Any>> {
        val query =
            config.queries[queryName]
                ?: throw IllegalArgumentException("Query with name '$queryName' not found in configuration")

        return jdbi.withHandleUnchecked { handle ->
            handle
                .createQuery(query)
                .bindMap(parameters)
                .mapToMap()
                .list()
        }
    }
}

Here, withHandleUnchecked simplifies the code by throwing unchecked exceptions (like SQLException) directly, avoiding the need to declare a checked exception type. It’s a Kotlin-friendly tweak from jdbi3-kotlin, keeping things concise.

Step 4: REST Endpoint

Expose a POST endpoint:

import jakarta.ws.rs.Consumes
import jakarta.ws.rs.POST
import jakarta.ws.rs.Path
import jakarta.ws.rs.Produces
import jakarta.ws.rs.core.MediaType
import org.jboss.resteasy.reactive.RestPath

@Path("/query")
class QueryResource(
    val parquetService: ParquetService,
) {
    @POST
    @Path("/{queryName}")
    @Consumes(MediaType.APPLICATION_JSON)
    @Produces(MediaType.APPLICATION_JSON)
    fun queryWithParams(
        @RestPath queryName: String,
        body: Map<String, Any>,
    ) = parquetService.executeQuery(queryName, body)
}

Step 5: Testing with RestAssured-Kotlin

Add a test in 'src/test/kotlin'

@QuarkusTest
class QueryResourceTest {
    @Test
    fun testQueryEndpoint() {
        Given {
            body("{}")
            header("Content-Type", "application/json")
        } When {
            post("/query/get-all")
        } Then {
            statusCode(200)
        }
    }
}

Here, restassured kotlin-extensions DSL made RESTful test more readable in Given-When-Then tyle

Testing It Out

Run the app:

./mvnw quarkus:dev

Test with http test

### GET All
POST http://localhost:8080/query/get-all
Content-Type: application/json

{}

### filter-by-column
POST http://localhost:8080/query/findUsersByLastName
Content-Type: application/json

{
  "lastName": "Mason"
}

Why This Shines

This setup skips big data frameworks for a lightweight win. DuckDB queries Parquet natively, Quarkus handles REST and scheduling, and Jdbi3-Kotlin’s withHandleUnchecked keeps queries sleek. ConfigMapping with @get:WithName makes config.queries a joy, and RestAssured ensures it all works reliably.

Wrapping Up

We’ve built a RESTful API for Parquet files with DuckDB, Quarkus, and Kotlin, dodging heavy frameworks. With Jdbi3-Kotlin, ConfigMapping’s field-like access, and RestAssured tests, it’s robust and simple. Try it with your own data—tweak the queries and share your thoughts in the comments!

The full project can be found on my github.

DEV Community