In my previous post, we explored how LangChain simplifies the development of AI-powered applications. We saw how its modularity, flexibility, and extensibility make it a powerful tool for working with large language models (LLMs) like Gemini. Now, let's take it a step further and see how we can deploy and scale our LangChain applications using the robust infrastructure of Google Kubernetes Engine (GKE) and the power of Gemini!
Why GKE for LangChain?
You might be wondering, "Why bother with Kubernetes? Isn't it complex?" While Kubernetes does have a learning curve (trust me, I've been through that!) GKE simplifies its management significantly by handling the heavy lifting for you so you can focus on your application.
Here's why GKE is an excellent choice for deploying LangChain applications:
- Scalability: GKE allows you to easily scale your application up or down based on demand. This is crucial for handling fluctuating traffic to your AI-powered features. Imagine your chatbot suddenly going viral - GKE ensures it doesn't crash under the load.
- Reliability: With GKE, your application runs on a cluster of machines, providing high availability and fault tolerance. If one machine fails, your application keeps running seamlessly.
- Resource efficiency: GKE optimizes resource utilization, ensuring your application uses only what it needs. This can lead to cost savings, especially when dealing with resource-intensive LLMs.
- Seamless integration with Google Cloud: GKE integrates smoothly with other Google Cloud services like Cloud Storage, Cloud SQL, and, importantly, Vertex AI, where Gemini and other LLMs are hosted.
- Versioning and rollbacks: GKE allows you to easily manage different versions of your application, making updates and rollbacks a breeze. This is incredibly useful when experimenting with different prompts or model parameters.
But that's enough talking, let's build something!
Deploying LangChain on GKE
Let's walk through an example of deploying a simple LangChain application that uses Gemini on GKE. We'll build a basic service, similar to the example from the previous post, but this time, it will be packaged as a containerized application ready for deployment.
Containerize your LangChain application
First, we need to package our LangChain application into a Docker container. This involves creating a Dockerfile
that specifies the environment and dependencies for our application. Here is a Python application using LangChain and Gemini, which we'll save as app.py
:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from flask import Flask, request
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro")
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant that answers questions about a given topic.",
),
("human", "{input}"),
]
)
chain = prompt | llm
def create_app():
app = Flask(__name__)
@app.route("/ask", methods=['POST'])
def talkToGemini():
user_input = request.json['input']
response = chain.invoke({"input": user_input})
return response.content
return app
if __name__ == "__main__":
app = create_app()
app.run(host='0.0.0.0', port=80)
Then, create a Dockerfile
to define how to assemble our image:
# Use an official Python runtime as a parent image
FROM python:3-slim
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Run app.py when the container launches
CMD [ "python", "app.py" ]
For our dependencies, create the requirements.txt
file containing LangChain and a web framework, Flask:
langchain
langchain-google-genai
flask
Finally, build the container image and push it to Artifact Registry. Don't forget to replace PROJECT_ID
with your Google Cloud project ID.
# Authenticate with Google Cloud
gcloud auth login
# Create the repository
gcloud artifacts repositories create images \
--repository-format=docker \
--location=us
# Configure authentication to the desired repository
gcloud auth configure-docker us-docker.pkg.dev/PROJECT_ID/images
# Build the image
docker build -t us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1 .
# Push the image
docker push us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1
After a handful of seconds, your container image should not be stored in your Artifact Registry repository.
Deploy to GKE
Now, let's deploy this image to our GKE cluster. You can create a GKE cluster through the Google Cloud Console or using the gcloud
command-line tool, again taking care of replacing PROJECT_ID
:
gcloud container clusters create-auto langchain-cluster \
--project=PROJECT_ID \
--region=us-central1
Once your cluster is up and running, create a YAML file with your Kubernetes deployment and service manifests. Let's call it deployment.yaml
, replacing PROJECT_ID
as well as YOUR_GOOGLE_API_KEY
with your Gemini API key:
apiVersion: apps/v1
kind: Deployment
metadata:
name: langchain-deployment
spec:
replicas: 3 # Scale as needed
selector: # Add selector here
matchLabels:
app: langchain-app
template:
metadata:
labels:
app: langchain-app
spec:
containers:
- name: langchain-container
image: us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1
ports:
- containerPort: 80
env:
- name: GOOGLE_API_KEY
value: YOUR_GOOGLE_API_KEY
---
apiVersion: v1
kind: Service
metadata:
name: langchain-service
spec:
selector:
app: langchain-app
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer # Exposes the service externally
Apply the manifest to your cluster:
# Get the context of your cluster
gcloud container clusters get-credentials langchain-cluster --region us-central1
# Deploy the manifest
kubectl apply -f deployment.yaml
This creates a deployment with three replicas of your LangChain application and exposes it externally through a load balancer. You can adjust the number of replicas based on your expected load.
Interact with your deployed application
Once the service is deployed, you can get the external IP address of your application using:
export EXTERNAL_IP=`kubectl get service/langchain-service \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}'`
You can now send requests to your LangChain application running on GKE. For example:
curl -X POST -H "Content-Type: application/json" \
-d '{"input": "Tell me a fun fact about hummingbirds"}' \
http://$EXTERNAL_IP/ask
Taking it further
This is just a basic example, but you can expand on it in many ways:
- Integrate with other Google Cloud services: Use Cloud SQL to store conversation history, or Cloud Storage to load documents for your chatbot to reference.
- Implement more complex LangChain flows: Build sophisticated applications with chains, agents, and memory, all running reliably on GKE.
- Set up CI/CD: Automate the build and deployment process using tools like Cloud Build and Cloud Deploy.
- Monitor and optimize: Use Cloud Monitoring and Cloud Logging to track the performance and health of your application.
Continue your journey
Deploying LangChain applications on GKE with Gemini unlocks a new level of scalability, reliability, and efficiency. You can now build and run powerful AI-powered applications that can handle real-world demands. By combining the developer-friendly nature of LangChain, the power of Gemini, and the robustness of GKE, you have all the tools you need to create truly impressive and impactful applications.
Next steps:
- Dive deeper into the GKE documentation.
- Explore the Vertex AI documentation for more advanced LLM management and deployment options.
- Check out the LangChain documentation for more complex use cases and examples.
In a future post, I will look into using an open model called Gemma!
Top comments (1)
Thanks for this!