DEV Community

Cover image for Streamline your LangChain deployments with LangServe
Olivier Bourgeois for Google Cloud

Posted on

Streamline your LangChain deployments with LangServe

Throughout this LangChain series, we've explored the power and flexibility of LangChain, from deploying it on Google Kubernetes Engine (GKE) with Gemini to running open models like Gemma. Now, let's introduce an interesting complement to help us deploy LangChain-powered applications as a REST API: LangServe.

What is LangServe?

LangServe is a helpful tool designed to simplify the deployment of LangChain applications as REST APIs. Instead of having to manually take care of the REST logic for your LLM deployment (like exposing endpoints or serving API documentation) we can get LangServe to do that for us. It's built by the same team behind LangChain, ensuring seamless integration and a developer-friendly experience.

Why use LangServe?

In the previous parts of this LangChain series, we've seen how to deploy a LangChain-powered application and how to talk to it. Isn't that enough? Well, LangServe offers several key advantages:

  • Rapid deployment: LangServe drastically reduces the amount of boilerplate code needed to expose your LangChain applications as APIs.
  • Automatic API documentation: LangServe automatically generates interactive API documentation for your deployed chains, making it easy for others (or your future self, if you're like me) to understand and use your services.
  • Built-in playground: LangServe provides a simple web playground for interacting with your deployed LangChain applications directly from your browser. This is incredibly helpful for testing and debugging.
  • Standardized interface: LangServe helps you create consistent, well-structured APIs for your LangChain applications, making them easier to integrate with other services and front-end applications.
  • Simplified client interaction: LangServe comes with a corresponding client library that simplifies calling your deployed chains from other Python or JavaScript applications.

How does LangServe work?

LangServe leverages the power of FastAPI and pydantic to create a robust and efficient serving layer for your LangChain applications. It essentially wraps your LangChain chains or agents, turning them into FastAPI endpoints.

Let's look at an example and see how that all comes together.

Building a LangServe application

Let's say you have the following LangChain application that uses Gemini:

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that answers questions about a given topic.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm
Enter fullscreen mode Exit fullscreen mode

Here's how you would adapt it for LangServe, which you can save as app.py:

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langserve import add_routes
import uvicorn
from fastapi import FastAPI

app = FastAPI(
  title="LangChain Server",
  version="1.0",
  description="A simple API server using LangChain's Runnable interfaces",
)

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that answers questions about a given topic.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm | StrOutputParser()

add_routes(
    app,
    chain,
    path="/my-chain",
)

if __name__ == "__main__":
    uvicorn.run(app, host="localhost", port=8000)
Enter fullscreen mode Exit fullscreen mode

Then, create a requirements.txt file with our dependencies:

langserve
langchain-google-genai
uvicorn
fastapi
sse_starlette
Enter fullscreen mode Exit fullscreen mode

And that's it! With these simple changes, your chain is now ready to be served. You can install dependencies and run this application using the following commands. Make sure to replace the your_google_api_key string with your Gemini API key.

export GOOGLE_API_KEY="your_google_api_key"
pip install -r requirements.txt
python app.py
Enter fullscreen mode Exit fullscreen mode

This will start a server, by default on port 8000.

Interacting with your LangServe application

Once your server is running, you can interact with it in several ways:

  • Through the automatically generated API docs: Navigate to http://localhost:8000/docs in your browser to see the interactive API documentation.
  • Using the built-in playground: Go to http://localhost:8000/my-chain/playground/ to try out your chain directly in a simple web interface.
  • Using the LangServe client: You can use the provided client library to interact with your API programmatically from other Python or JavaScript applications. Here's a simple Python example:
from langserve import RemoteRunnable

remote_chain = RemoteRunnable("http://localhost:8000/my-chain")
response = remote_chain.invoke({"input": "Tell me about Google Cloud Platform"})
print(response)
Enter fullscreen mode Exit fullscreen mode

Containerizing our application

You can also easily containerize your LangServe application to deploy on a platform like GKE, just like we did with our previous examples.

First, create a Dockerfile to define how to assemble our image:

# Use an official Python runtime as a parent image
FROM python:3-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Run app.py when the container launches
CMD [ "python", "app.py" ]
Enter fullscreen mode Exit fullscreen mode

Finally, build the container image and push it to Artifact Registry. Don't forget to replace PROJECT_ID with your Google Cloud project ID.

# Authenticate with Google Cloud
gcloud auth login

# Create the repository
gcloud artifacts repositories create images \
  --repository-format=docker \
  --location=us

# Configure authentication to the desired repository
gcloud auth configure-docker us-docker.pkg.dev/PROJECT_ID/images

# Build the image
docker build -t us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1 .

# Push the image
docker push us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1
Enter fullscreen mode Exit fullscreen mode

After a handful of seconds, your container image should not be stored in your Artifact Registry repository.

Now, let's deploy this image to our GKE cluster. You can create a GKE cluster through the Google Cloud Console or using the gcloud command-line tool, again taking care of replacing PROJECT_ID:

gcloud container clusters create-auto langchain-cluster \
  --project=PROJECT_ID \
  --region=us-central1
Enter fullscreen mode Exit fullscreen mode

Once your cluster is up and running, create a YAML file with your Kubernetes deployment and service manifests. Let's call it deployment.yaml, replacing PROJECT_ID as well as YOUR_GOOGLE_API_KEY with your Gemini API key:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-deployment
spec:
  replicas: 3 # Scale as needed
  selector: # Add selector here
    matchLabels:
      app: langchain-app
  template:
    metadata:
      labels:
        app: langchain-app
    spec:
      containers:
      - name: langchain-container
        image: us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1
        ports:
        - containerPort: 80
        env:
        - name: GOOGLE_API_KEY
          value: YOUR_GOOGLE_API_KEY
---
apiVersion: v1
kind: Service
metadata:
  name: langchain-service
spec:
  selector:
    app: langchain-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer # Exposes the service externally
Enter fullscreen mode Exit fullscreen mode

Apply the manifest to your cluster:

# Get the context of your cluster
gcloud container clusters get-credentials langchain-cluster --region us-central1

# Deploy the manifest
kubectl apply -f deployment.yaml
Enter fullscreen mode Exit fullscreen mode

This creates a deployment with three replicas of your LangChain application and exposes it externally through a load balancer. You can adjust the number of replicas based on your expected load.

Conclusion

LangServe bridges the gap between development and production, making it easier than ever to share your AI applications with the world. By providing a simple, standardized way to serve your chains as APIs, LangServe unlocks a whole new level of accessibility and usability for your LangChain projects. Whether you're building internal tools or public-facing applications, LangServe streamlines the process, letting you focus on crafting impactful applications with LangChain.

Next Steps:

  • Dive into the LangServe documentation for a more in-depth look at its features and capabilities.
  • Experiment with deploying a LangServe application to GKE using the containerization techniques we've covered.
  • Explore the LangServe client library to see how you can easily integrate your deployed chains with other applications.

With this post, we conclude our journey through the world of LangChain, from its core concepts to advanced deployment strategies with GKE, open models, and now, streamlined serving with LangServe. I hope this series has empowered you to build and deploy your own amazing AI-powered applications!

Top comments (0)