DEV Community

Cover image for Deploy Gemini-powered LangChain applications on GKE
Olivier Bourgeois for Google Cloud

Posted on • Originally published at Medium

Deploy Gemini-powered LangChain applications on GKE

In my previous post, we explored how LangChain simplifies the development of AI-powered applications. We saw how its modularity, flexibility, and extensibility make it a powerful tool for working with large language models (LLMs) like Gemini. Now, let's take it a step further and see how we can deploy and scale our LangChain applications using the robust infrastructure of Google Kubernetes Engine (GKE) and the power of Gemini!

Why GKE for LangChain?

You might be wondering, "Why bother with Kubernetes? Isn't it complex?" While Kubernetes does have a learning curve (trust me, I've been through that!) GKE simplifies its management significantly by handling the heavy lifting for you so you can focus on your application.

Here's why GKE is an excellent choice for deploying LangChain applications:

  • Scalability: GKE allows you to easily scale your application up or down based on demand. This is crucial for handling fluctuating traffic to your AI-powered features. Imagine your chatbot suddenly going viral - GKE ensures it doesn't crash under the load.
  • Reliability: With GKE, your application runs on a cluster of machines, providing high availability and fault tolerance. If one machine fails, your application keeps running seamlessly.
  • Resource efficiency: GKE optimizes resource utilization, ensuring your application uses only what it needs. This can lead to cost savings, especially when dealing with resource-intensive LLMs.
  • Seamless integration with Google Cloud: GKE integrates smoothly with other Google Cloud services like Cloud Storage, Cloud SQL, and, importantly, Vertex AI, where Gemini and other LLMs are hosted.
  • Versioning and rollbacks: GKE allows you to easily manage different versions of your application, making updates and rollbacks a breeze. This is incredibly useful when experimenting with different prompts or model parameters.

But that's enough talking, let's build something!

Deploying LangChain on GKE

Let's walk through an example of deploying a simple LangChain application that uses Gemini on GKE. We'll build a basic service, similar to the example from the previous post, but this time, it will be packaged as a containerized application ready for deployment.

Containerize your LangChain application

First, we need to package our LangChain application into a Docker container. This involves creating a Dockerfile that specifies the environment and dependencies for our application. Here is a Python application using LangChain and Gemini, which we'll save as app.py:

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from flask import Flask, request

llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro")

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that answers questions about a given topic.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm

def create_app():
    app = Flask(__name__)

    @app.route("/ask", methods=['POST'])
    def talkToGemini():
        user_input = request.json['input']
        response = chain.invoke({"input": user_input})
        return response.content

    return app

if __name__ == "__main__":
    app = create_app()
    app.run(host='0.0.0.0', port=80)
Enter fullscreen mode Exit fullscreen mode

Then, create a Dockerfile to define how to assemble our image:

# Use an official Python runtime as a parent image
FROM python:3-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Run app.py when the container launches
CMD [ "python", "app.py" ]
Enter fullscreen mode Exit fullscreen mode

For our dependencies, create the requirements.txt file containing LangChain and a web framework, Flask:

langchain
langchain-google-genai
flask
Enter fullscreen mode Exit fullscreen mode

Finally, build the container image and push it to Artifact Registry. Don't forget to replace PROJECT_ID with your Google Cloud project ID.

# Authenticate with Google Cloud
gcloud auth login

# Create the repository
gcloud artifacts repositories create images \
  --repository-format=docker \
  --location=us

# Configure authentication to the desired repository
gcloud auth configure-docker us-docker.pkg.dev/PROJECT_ID/images

# Build the image
docker build -t us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1 .

# Push the image
docker push us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1
Enter fullscreen mode Exit fullscreen mode

After a handful of seconds, your container image should not be stored in your Artifact Registry repository.

Deploy to GKE

Now, let's deploy this image to our GKE cluster. You can create a GKE cluster through the Google Cloud Console or using the gcloud command-line tool, again taking care of replacing PROJECT_ID:

gcloud container clusters create-auto langchain-cluster \
  --project=PROJECT_ID \
  --region=us-central1
Enter fullscreen mode Exit fullscreen mode

Once your cluster is up and running, create a YAML file with your Kubernetes deployment and service manifests. Let's call it deployment.yaml, replacing PROJECT_ID as well as YOUR_GOOGLE_API_KEY with your Gemini API key:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-deployment
spec:
  replicas: 3 # Scale as needed
  selector: # Add selector here
    matchLabels:
      app: langchain-app
  template:
    metadata:
      labels:
        app: langchain-app
    spec:
      containers:
      - name: langchain-container
        image: us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1
        ports:
        - containerPort: 80
        env:
        - name: GOOGLE_API_KEY
          value: YOUR_GOOGLE_API_KEY
---
apiVersion: v1
kind: Service
metadata:
  name: langchain-service
spec:
  selector:
    app: langchain-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer # Exposes the service externally
Enter fullscreen mode Exit fullscreen mode

Apply the manifest to your cluster:

# Get the context of your cluster
gcloud container clusters get-credentials langchain-cluster --region us-central1

# Deploy the manifest
kubectl apply -f deployment.yaml
Enter fullscreen mode Exit fullscreen mode

This creates a deployment with three replicas of your LangChain application and exposes it externally through a load balancer. You can adjust the number of replicas based on your expected load.

Interact with your deployed application

Once the service is deployed, you can get the external IP address of your application using:

export EXTERNAL_IP=`kubectl get service/langchain-service \
  --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`
Enter fullscreen mode Exit fullscreen mode

You can now send requests to your LangChain application running on GKE. For example:

curl -X POST -H "Content-Type: application/json" \
  -d '{"input": "Tell me a fun fact about hummingbirds"}' \
  http://$EXTERNAL_IP/ask
Enter fullscreen mode Exit fullscreen mode

Taking it further

This is just a basic example, but you can expand on it in many ways:

  • Integrate with other Google Cloud services: Use Cloud SQL to store conversation history, or Cloud Storage to load documents for your chatbot to reference.
  • Implement more complex LangChain flows: Build sophisticated applications with chains, agents, and memory, all running reliably on GKE.
  • Set up CI/CD: Automate the build and deployment process using tools like Cloud Build and Cloud Deploy.
  • Monitor and optimize: Use Cloud Monitoring and Cloud Logging to track the performance and health of your application.

Continue your journey

Deploying LangChain applications on GKE with Gemini unlocks a new level of scalability, reliability, and efficiency. You can now build and run powerful AI-powered applications that can handle real-world demands. By combining the developer-friendly nature of LangChain, the power of Gemini, and the robustness of GKE, you have all the tools you need to create truly impressive and impactful applications.

Next steps:

In a future post, I will look into using an open model called Gemma!

Top comments (1)

Collapse
 
avanichols_dev profile image
Ava Nichols

Thanks for this!