DEV Community

Cover image for Why Running Databases on Kubernetes is Like Storing Critical Data on a Fragile Flash Drive
Ali Alp
Ali Alp

Posted on

Why Running Databases on Kubernetes is Like Storing Critical Data on a Fragile Flash Drive

When managing stateful applications like databases in Kubernetes, the concept of Persistent Volume Claims (PVCs) plays a crucial role. But what does a PVC really do, and how does it compare to something familiar, like plugging a flash drive into a physical server? Let’s explore this with a relatable analogy and dive into the potential risks, including "dangling" storage, especially when the CSI (Container Storage Interface) controller is down. Ultimately, we’ll conclude why running databases in Kubernetes may not be the best idea.

The Flash Drive Analogy: A Simple Comparison

Imagine you’re running a database on a physical server. Instead of using the server’s internal storage, you plug in a flash drive and point the database to use that for its data storage. The flash drive represents an external storage device that you can connect or disconnect at will. As long as it’s connected, your database knows where its data lives.

In Kubernetes, Persistent Volume Claims (PVCs) serve a similar purpose. A PVC is like the database requesting storage, while the actual storage (whether it’s cloud-backed, network-attached, or local) is the Persistent Volume (PV). When you create a PVC, you’re asking Kubernetes to attach the storage (the PV) to your database container (pod). This is like connecting the flash drive to your physical server.

Comparison: Flash Drives vs. PVC/PV in Kubernetes

Aspect Flash Drive on a Physical Server Persistent Volume Claim (PVC) in Kubernetes
Attachment/Detachment Manually plugged in or removed Dynamically attached/detached via PVC requests and the CSI controller
Risk of Orphaned Storage If not ejected properly, could result in corruption Risk of "dangling" PVC/PV if the CSI controller fails or malfunctions
Storage Dependency Directly dependent on the physical connection Dependent on Kubernetes scheduling and CSI controller for proper attachment
Data Path Manually configured in the database’s configuration PVC automatically binds to the pod as a volume, once the claim is fulfilled
Failure Scenario Data corruption if flash drive is improperly removed Dangling PV/PVC, inaccessible data, or risk of corruption if CSI fails
Management Manual plug-and-play Automated via Kubernetes, but requires CSI controller to work reliably

Dangling Storage: A Risk in Both Worlds

Much like unplugging a flash drive without properly ejecting it from a server, in Kubernetes, storage can become "dangling" if the CSI controller fails or if the PV isn’t cleanly detached from a pod. Dangling storage happens when a PVC remains bound to a PV, but the pod is unable to access it, leading to risks like:

  • Data corruption: If storage is not properly detached or reattached, data can become corrupted.
  • Orphaned resources: Unused PVs can remain allocated, wasting resources without being utilized.
  • Manual recovery: Similar to fixing a broken flash drive connection, you may need to manually intervene, cleaning up or reattaching the storage to your pods.

These issues become especially problematic with stateful applications like databases, where losing access to storage can result in catastrophic consequences.

Storage Lifecycle in Kubernetes

Once a pod requests storage through a PVC, Kubernetes does its best to bind that request to an available PV. Once bound, it’s similar to "claiming" a storage resource, like plugging in a flash drive. The PV stays attached until it’s explicitly released.

While this model works for many types of workloads, it introduces risks for stateful applications, such as databases, that require consistent, uninterrupted access to their data. If the CSI controller fails or Kubernetes is unable to manage the storage properly, your database may find itself unable to access its critical data.

Why Running Databases in Kubernetes Can Be a Bad Idea

Databases are notoriously sensitive to storage-related issues. They rely on low-latency, high-availability storage systems, and interruptions in storage access can lead to data corruption, downtime, or even total data loss.

Kubernetes, with its emphasis on dynamic orchestration and transient workloads, was not originally designed for highly stateful, storage-intensive applications like databases. While tools like PVCs and PVs have been introduced to support stateful applications, they come with inherent risks:

  • Complex orchestration: The dynamic nature of Kubernetes can lead to unexpected behaviors, like storage being reattached inappropriately or resources dangling if something fails.
  • CSI controller dependency: A failed CSI controller can leave your database without access to its critical data, potentially causing significant downtime or data loss.
  • Dangling storage risks: Orphaned PVs or improperly released PVCs can make it hard to recover your data, and manually fixing these issues can be complex and error-prone.

Conclusion: Why Databases and Kubernetes Aren’t Always a Great Match

While Kubernetes has evolved to handle stateful applications, its underlying architecture is still fundamentally designed around stateless, ephemeral workloads. This makes running databases in Kubernetes a potentially risky move, especially for production environments that require strong guarantees around data persistence, high availability, and low-latency access.

In short, much like how you wouldn’t rely on a fragile external flash drive for your mission-critical database storage, it’s risky to rely on PVCs and PVs in Kubernetes without accounting for potential failures in storage management, particularly the CSI controller. The risk of dangling storage, data inaccessibility, and manual recovery means that Kubernetes may not be the best fit for running databases, particularly in environments where stability, reliability, and data integrity are paramount.

Top comments (4)

Collapse
 
junquero profile image
Ildefonso Junquero

Ok. Asuming you’re right. What would be the correct approach in your POV? Deploying a cluster on physical servers and connect them to a cloud provider where the workload runs?

Please, don’t just explain issues, but propose solutions.

On the other hand I wouldn’t compare K8s storage with a pen drive, but with a NAS.

Collapse
 
alialp profile image
Ali Alp

Hi :)

My Point of views:

  • For highly stateful workloads like databases, dedicated infrastructure or managed cloud databases are generally safer.
  • Use Kubernetes for stateless applications or workloads that can handle the dynamic nature of orchestration.

P.S.

The flash drive example works because most people have experienced issues with plugging and unplugging a drive improperly, which mirrors the risks of "dangling" storage in Kubernetes when the CSI controller fails. you are right. you are right, comparing with NAS would work as well but where is the fun in that :)

Collapse
 
bck01215 profile image
Brandon Kauffman

I'd disagree. You can use local storage on a node and dedicate a node to just the database. Then you anti affinities to make sure no more than 1 pod for the database is scheduled on the node.

For example, I use CNPG to run a postgres instance and two replicas all on different nodes with dedicated local storage. Kubernetes is my replacement for managing patroni and allows me to automate fail over with minimal or no extra setup.

Thread Thread
 
alialp profile image
Ali Alp

Using CNPG with local storage and anti-affinities for Postgres in Kubernetes does improve failover and distribution across nodes. However, it still faces many of the same challenges discussed in the article. For example, node failure can lead to data loss if the local storage is tied to that node, and Kubernetes isn't natively designed to handle stateful workloads like databases. This setup also complicates backup, recovery, and data consistency across nodes, making it vulnerable to the same risks highlighted in the article.