DEV Community

Cover image for Deploy Ollama on a Local Kubernetes cluster in 5 Minutes
Ajeet Singh Raina
Ajeet Singh Raina

Posted on • Edited on

Deploy Ollama on a Local Kubernetes cluster in 5 Minutes

Ollama, the rapidly growing large language model, has taken the developer world by storm. Its capabilities for generating text, translating languages, and writing different kinds of creative content are truly impressive. But how do you leverage Ollama's potential effectively within your development workflow? Enter Docker Desktop and Kubernetes – a powerful combination that allows you to seamlessly run Ollama in a containerized environment.

Why Kubernetes and Docker Desktop?

Docker Desktop provides a user-friendly platform for building and running containerized applications. Ollama, packaged as a Docker image, fits perfectly into this ecosystem. Kubernetes, on the other hand, orchestrates container deployment and management, ensuring efficient resource allocation and scalability for your Ollama instance.

Setting the Stage

  • Install Docker Desktop: Download and install Docker Desktop on your machine. This provides the foundation for building and running containerized applications.

  • Pull the Ollama Image: Use the docker pull command to fetch the official Ollama image from Docker Hub. This image contains all the necessary libraries and dependencies for running Ollama.

  • Create a Kubernetes Pod: Define a Kubernetes pod YAML file specifying the Ollama image, resource requirements, and any desired configurations. This file instructs Kubernetes on how to deploy and manage the Ollama container.

  • Deploy the Pod: Use the kubectl apply command to deploy the pod based on your YAML definition. Kubernetes will then create and manage the Ollama container, ensuring it has the necessary resources to function effectively.

Benefits of this approach:

  • Isolation and Scalability: Running Ollama in a container isolates it from your system's environment, preventing conflicts and ensuring a clean execution. Additionally, Kubernetes allows you to easily scale your Ollama deployment by adding more pods, catering to increased workload demands.

  • Resource Management: Kubernetes effectively manages resource allocation for the Ollama container, preventing it from hogging system resources and impacting other applications.

  • Portability and Collaboration: Docker containers and Kubernetes deployments are inherently portable. Share your Ollama deployment configuration with your team, allowing them to easily run it on their own Docker Desktop and Kubernetes environment, fostering seamless collaboration.

Getting Started

  • Install Docker Desktop
  • Enable Kubernetes

Image1

Ensure that a single node Kubernetes cluster is up and running by running the following command:

kubectl get nodes
NAME             STATUS   ROLES           AGE   VERSION
docker-desktop   Ready    control-plane   57s   v1.29.1
Enter fullscreen mode Exit fullscreen mode

Open up a terminal, copy the below content in a file called ollama.yaml and save it anywhere in your system.

apiVersion: v1
kind: Pod
metadata:
  name: ollama-pod
spec:
  containers:
  - name: ollama
    image: ollama/ollama:latest # Replace with desired Ollama image tag
    resources:
      requests:
        memory: "2Gi"
        cpu: "1"
      limits:
        memory: "4Gi"
        cpu: "2"
    ports:
    - containerPort: 11434
  restartPolicy: Always

Enter fullscreen mode Exit fullscreen mode

If you're new to Kubernetes YAML, this section might be useful:

  • apiVersion: Specifies the Kubernetes API version used (v1 in this case).
  • kind: Indicates the type of object being defined (Pod in this case).
  • metadata: Defines metadata about the Pod, including its name (ollama-pod).
  • spec: Defines the Pod's configuration, including: containers: An array of container definitions. name: Name of the container (ollama).
  • image: The Docker image to use (ollama/ollama:latest). Replace with the desired Ollama image tag. resources: Defines resource requests and limits for the container:
  • requests: Minimum guaranteed resources for the container.
  • limits: Maximum resources the container can use.
  • memory: Memory request and limit (e.g., 2Gi, 4Gi).
  • cpu: CPU request and limit (e.g., 1, 2). ports: Exposes the container's port (11434 by default).
  • restartPolicy: Defines how the Pod should be restarted in case of failure (Always in this case).

Bringing up the Pod

 kubectl apply -f ollama.yaml
Enter fullscreen mode Exit fullscreen mode
kubectl describe po
Name:             ollama-pod
Namespace:        default
Priority:         0
Service Account:  default
Node:             docker-desktop/192.168.65.3
Start Time:       Wed, 21 Feb 2024 10:01:34 +0530
Labels:           <none>
Annotations:      <none>
Status:           Running
IP:               10.1.0.6
IPs:
  IP:  10.1.0.6
Containers:
  ollama:
    Container ID:   docker://e04e664eea3123151f6f90806951d101826a3689000f27fabeab2c53de36e977
    Image:          ollama/ollama:latest
    Image ID:       docker-pullable://ollama/ollama@sha256:2bb3fa14517aff428033cce369a2cac3baf9215fed5b401f87e30b52e39ae124
    Port:           11434/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 21 Feb 2024 10:01:37 +0530
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:        1
      memory:     2Gi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6l4gz (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  kube-api-access-6l4gz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  11s   default-scheduler  Successfully assigned default/ollama-pod to docker-desktop
  Normal  Pulling    10s   kubelet            Pulling image "ollama/ollama:latest"
  Normal  Pulled     8s    kubelet            Successfully pulled image "ollama/ollama:latest" in 2.082s (2.082s including waiting)
  Normal  Created    8s    kubelet            Created container ollama
  Normal  Started    8s    kubelet            Started container ollama
Enter fullscreen mode Exit fullscreen mode

Image2

Running Ollama WebUI in a Kubernetes Pod

apiVersion: v1
kind: Service
metadata:
  name: ollama-service
spec:
  selector:
    app: ollama
  ports:
  - protocol: TCP
    port: 11434
    targetPort: 11434

---

apiVersion: v1
kind: Pod
metadata:
  name: ollama-pod
spec:
  containers:
  - name: ollama
    image: ollama/ollama:latest # Replace with desired Ollama image tag
    resources:
      requests:
        memory: "2Gi"
        cpu: "1"
      limits:
        memory: "4Gi"
        cpu: "2"
    ports:
    - containerPort: 11434
  restartPolicy: Always

---

apiVersion: v1
kind: Deployment
metadata:
  name: open-webui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: open-webui
  template:
    metadata:
      labels:
        app: open-webui
    spec:
      containers:
      - name: open-webui
        image: ghcr.io/open-webui/open-webui:main
        env:
        - name: OLLAMA_API_BASE_URL
          value: http://ollama-service:11434/api # Replace with Ollama service name or URL
        - name: WEBUI_SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: open-webui-secret
              key: web-secret
        volumeMounts:
        - name: open-webui-data
          mountPath: /app/backend/data
      volumes:
      - name: open-webui-data
        persistentVolumeClaim:
          claimName: open-webui-pvc

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: open-webui-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

---

kind: Secret
apiVersion: v1
metadata:
  name: open-webui-secret
stringData:
  web-secret: "" # Replace with your actual secret value
Enter fullscreen mode Exit fullscreen mode

Conclusion

This blog post has explored how to leverage Docker Desktop and Kubernetes to effectively run Ollama within a containerized environment. By combining these powerful tools, you gain several advantages:

  • Isolation and Scalability: Ollama runs in a dedicated container, preventing conflicts with your system and enabling easy scaling to meet increased demands.
  • Resource Management: Kubernetes efficiently allocates resources to the Ollama container, ensuring optimal performance without impacting other applications.
  • Portability and Collaboration: Docker containers and Kubernetes deployments are inherently portable, allowing seamless sharing of your Ollama setup with your team.

Top comments (0)