DEV Community

Kamesh Sampath
Kamesh Sampath

Posted on • Originally published at Medium

Let's Build Together: A Local Playground for Apache Polaris

Why I Built a Developer-First Apache Polaris Starter Kit ?

Photo by Maxime Agnelli on Unsplash

As builders, we all know the pain of setting up a new development environment. Hours spent configuring dependencies, troubleshooting integration issues, and getting different services to play nicely together. When I started working with Apache Polaris, I faced these same challenges – and decided to do something about it.

The Challenge: Getting Started with Apache Polaris

Apache Polaris is a powerful open source Iceberg REST catalog implementation, originally contributed to the Apache Software Foundation by Snowflake. This donation to open source has made enterprise-grade data catalog capabilities accessible to the broader data community via simple REST APIs.

Setting up Polaris in a development environment can be challenging. You need:

  • A robust container orchestration platform
  • A working metastore (typically PostgreSQL)
  • S3-compatible storage
  • Various security configurations and credentials

Each of these components requires careful setup and configuration. For builders just getting started or wanting to experiment with Polaris, this overhead can be a significant barrier.

The Solution: A Complete Development Environment

This is why I created an open source starter kit that provides everything needed to get Polaris up and running in a local development environment. The project follows the true spirit of open source collaboration, building upon and integrating with other excellent open source tools in the ecosystem.

The kit automates the setup of:

  • A lightweight k3s Kubernetes cluster using k3d
  • LocalStack for AWS S3 emulation
  • PostgreSQL metastore with proper configurations
  • All necessary security credentials and configurations

A key aspect of this starter kit is its comprehensive automation using Ansible. The polaris-forge-setup directory houses Ansible playbooks that:

  • Automate the entire setup process
  • Verify if components are ready for use
  • Handle catalog setup and configuration
  • Provide cleanup capabilities for development iterations
  • Enable smooth transitions to higher environments

This automation-first approach serves two purposes:

  1. Immediate Development: Developers can get started quickly with minimal manual intervention
  2. Production Readiness: The Ansible scripts serve as a template for scaling to higher environments, making it easier to adapt the setup for staging or production use cases

By keeping everything open source and focusing on community-driven development, we ensure that builders can:

  • Learn from the implementation
  • Customize for their specific needs
  • Contribute improvements back to the community
  • Build upon a foundation of trusted open source tools

What is Snowflake OpenCatalog?

Snowflake OpenCatalog is an enterprise-grade implementation and managed service of upstream Polaris, making it incredibly easy to integrate with your existing data stack. By handling the operational complexities of running Polaris at scale, it allows teams to focus on their data applications:

  • Managed Infrastructure: Snowflake handles all operational aspects including:

    • Polaris server management and scaling
    • Security and access control
    • High availability and reliability
    • Regular updates and maintenance
  • Enterprise Integration: Seamless connectivity with:

    • Snowflake's ecosystem of data services
    • Popular query engines and tools
    • Existing data governance frameworks
    • Enterprise security systems
  • Production-Ready Features:

    • Advanced access controls and auditing
    • Cross-region and cross-cloud support
    • Enterprise-grade SLAs
    • Professional support

From Local Development to Enterprise Scale

This starter kit provides an ideal path for builders working with Apache Polaris and considering OpenCatalog for production deployment. By working with the upstream version in this development environment, you:

  1. Gain hands-on experience with core concepts
  2. Understand the underlying architecture
  3. Can prototype and test implementations
  4. Build expertise that transfers to OpenCatalog
  5. Have a clear path to production scaling

When you're ready to move to production, the concepts and patterns you've learned here will help you make the most of OpenCatalog's enterprise capabilities while Snowflake handles the operational complexity.

Technical Design Decisions

Why Kubernetes with k3s and k3d?

While Docker Compose is often the go-to choice for local development environments, Apache Polaris's distributed nature benefits significantly from Kubernetes's capabilities. Here's why:

  1. Advanced Networking: Kubernetes provides sophisticated networking between components:

    • Automatic service discovery and DNS resolution
    • Internal load balancing for scalable services
    • Ingress management for external access
    • Network policies for traffic control
  2. Declarative Configuration: Using tools like Helm and Kustomize, we can:

    • Maintain separate configurations for different environments
    • Version control our infrastructure setup
    • Apply consistent changes across deployments
    • Manage complex dependencies between services
  3. Reliable State Management:

    • StatefulSets for databases and stateful services
    • PersistentVolumes for durable storage
    • Backup and restore capabilities
    • Data replication when needed
  4. Security and Configuration:

    • Native secrets management
    • Role-Based Access Control (RBAC)
    • ConfigMaps for configuration management
    • Service accounts for component authentication
  5. Production Readiness:

    • Same tools and patterns used in production
    • Easy scaling of components
    • Built-in monitoring and logging
    • Consistent behavior across environments

I specifically chose k3s because it's lightweight and perfect for development environments. Using k3d allows us to run k3s in Docker containers, making it even more convenient for local development. It provides a full Kubernetes experience without the resource overhead of something like minikube.

LocalStack for S3 Integration

While we could have required developers to use actual AWS S3, LocalStack provides a perfect local alternative. It emulates AWS services locally, which means:

  • No cloud costs during development
  • No need for AWS credentials
  • Faster development cycles
  • Ability to work offline

PostgreSQL as the Metastore

PostgreSQL was a natural choice for the metastore. It's:

  • Well-documented and widely used
  • Easy to containerize
  • Highly reliable
  • Supported out of the box by Polaris

Kustomize for Deployment Management

Kustomize allows us to manage Kubernetes manifests in a clean, declarative way. It makes it easy to:

  • Maintain different configurations for different environments
  • Override settings without modifying base configurations
  • Keep configurations DRY and maintainable

Getting Started

Let me walk you through how to get up and running with this starter kit.

  • Ensure you have the prerequisites installed:
# Required tools and their version checks:

# Docker (Desktop or Engine)
docker --version
# Example output: Docker version 24.0.7

# Kubernetes CLI
kubectl version --client
# Example output: Client Version: v1.28.2

# k3d (>= 5.0.0)
k3d version
# Example output: k3d version v5.6.0

# Python (>= 3.11)
python --version
# Example output: Python 3.12.1

# uv (Python packaging tool)
uv --version
# Example output: uv 0.1.12

# Task
task --version
# Example output: Task version: v3.34.1

# LocalStack (>= 3.0.0)
localstack --version
# Example output: 3.0.0
Enter fullscreen mode Exit fullscreen mode

Initial Setup

Clone the repository and set up your environment:

git clone https://github.com/snowflake-labs/polaris-local-forge
cd polaris-local-forge

# Set up environment variables
export PROJECT_HOME="$PWD"
export KUBECONFIG="$PWD/.kube/config"
export K3D_CLUSTER_NAME=polaris-local-forge
export K3S_VERSION=v1.32.1-k3s1
export FEATURES_DIR="$PWD/k8s"
Enter fullscreen mode Exit fullscreen mode

Python Environment Setup

# Install uv
pip install uv

# Set up Python environment
uv python pin 3.12
uv venv
source .venv/bin/activate  # On Unix-like systems
uv sync
Enter fullscreen mode Exit fullscreen mode

Deploy the Environment

The setup process is automated through several scripts:

# Generate required sensitive files
$PROJECT_HOME/polaris-forge-setup/prepare.yml

# Create and set up the cluster
$PROJECT_HOME/bin/setup.sh

# Wait for deployments to be ready
$PROJECT_HOME/polaris-forge-setup/cluster_checks.yml --tags namespace,postgresql,localstack
Enter fullscreen mode Exit fullscreen mode

Deploy Polaris

This is where things get interesting - deploying Polaris itself. You have two options for the container images:

Option 1: Use Pre-built Images

Apache Polaris doesn't currently publish official images, but you can use our pre-built images with PostgreSQL dependencies:

docker pull ghcr.io/snowflake-labs/polaris-local-forge/apache-polaris-server-pgsql
docker pull ghcr.io/snowflake-labs/polaris-local-forge/apache-polaris-admin-tool-pgsql
Enter fullscreen mode Exit fullscreen mode

Option 2: Build Images Locally

Alternatively, you can build the images from source:

# Update IMAGE_REGISTRY in Taskfile.yml, then run:
task images
Enter fullscreen mode Exit fullscreen mode

If you choose to build locally, remember to update the image references in:

  • k8s/polaris/deployment.yaml
  • k8s/polaris/bootstrap.yaml
  • k8s/polaris/purge.yaml

Deploy and Verify

Apply the Kubernetes manifests:

# Apply Polaris manifests
kubectl apply -k $PROJECT_HOME/k8s/polaris

# Verify deployments and jobs
$PROJECT_HOME/polaris-forge-setup/cluster_checks.yml --tags polaris
Enter fullscreen mode Exit fullscreen mode

Setting Up Your First Catalog

Before creating your first catalog, configure your AWS environment variables:

export AWS_ENDPOINT_URL=http://localstack.localstack:14566
export AWS_ACCESS_KEY_ID=test
export AWS_SECRET_ACCESS_KEY=test
export AWS_REGION=us-east-1

# Run the catalog setup
$PROJECT_HOME/polaris-forge-setup/catalog_setup.yml
Enter fullscreen mode Exit fullscreen mode

Pro Tip: You can customize the default catalog settings by modifying values in polaris-forge-setup/defaults/main.yml. This file contains configurable parameters for your catalog, principal roles, and permissions.

Play with the Catalog

Once your catalog is set up, you can explore its functionality using the provided Jupyter notebook. The notebook notebooks/verify_setup.ipynb walks you through:

  • Creating a namespace
  • Defining a table
  • Inserting sample data
  • Verifying data storage in LocalStack

This hands-on exploration helps you understand how Polaris integrates with:

  • The PostgreSQL metastore for catalog management
  • LocalStack's S3 emulation for data storage
  • The overall Apache Iceberg table format structure

You can visually verify your setup by checking the LocalStack console at https://app.localstack.cloud/inst/default/resources/s3/polardb, where you'll see:

  • Catalog storage structure
  • Metadata files
  • Actual data files

Video Walkthrough

For a detailed visual guide of setting up and using this development environment, check out my walkthrough video:

Apache Polaris Local Development Setup

This video demonstrates the entire process from initial setup to running your first queries.

Troubleshooting Tips

If you run into issues, here are some helpful commands for debugging:

# Check Polaris server logs
kubectl logs -f -n polaris deployment/polaris

# Check PostgreSQL logs
kubectl logs -f -n polaris statefulset/postgresql

# Check LocalStack logs
kubectl logs -f -n localstack deployment/localstack

# Check events in the polaris namespace
kubectl get events -n polaris --sort-by='.lastTimestamp'
Enter fullscreen mode Exit fullscreen mode

The Impact: Streamlined Development Experience

With this starter kit, what used to take days of setup and configuration now takes minutes. Builders can focus on creating and experimenting with Polaris rather than wrestling with infrastructure setup.

The kit is open source and available on GitHub. I welcome contributions and feedback from the community. Together, we can make the development experience even better for everyone working with Apache Polaris.

Building should be about creating, not configuring. This starter kit aims to remove the friction from getting started with Apache Polaris, allowing builders to focus on what matters most – creating great applications.

Dont forget to check another project where I used this starter kit https://github.com/kameshsampath/balloon-popper-demo.

Related Projects and Tools

Core Components

  • Apache Polaris - Data Catalog and Governance Platform
  • PyIceberg - Python library for Apache Iceberg
  • LocalStack - AWS Cloud Service Emulator
  • k3d - k3s in Docker
  • k3s - Lightweight Kubernetes Distribution
  • Ansible - Automation Platform

Development Tools

  • Docker - Container Platform
  • Kubernetes - Container Orchestration
  • Helm - Kubernetes Package Manager
  • kubectl - Kubernetes CLI
  • uv - Python Packaging Tool

Top comments (0)