Louis Sanna

Posted on Dec 12, 2024

Integrating LangChain with FastAPI for Asynchronous Streaming

#langchain #fastapi #llm

LangChain and FastAPI working in tandem provide a strong setup for the asynchronous streaming endpoints that LLM-integrated applications need. Modern chat applications live or die by how effectively they handle live data streams and how quickly they can respond.

Introduction to LangChain

LangChain is a library that simplifies the incorporation of language models into applications. It provides an abstracted layer over various components such as large language models (LLMs), data retrievers, and vector storage solutions. This abstraction allows developers to integrate and switch between different backend providers or technologies seamlessly.

Introduction to FastAPI

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints. It is designed for creating RESTful APIs quickly and efficiently, with automatic interactive API documentation provided by Swagger and ReDoc.

Combining LangChain with FastAPI

By combining LangChain with FastAPI, developers can create robust, asynchronous streaming APIs that handle real-time data efficiently. This integration is particularly useful for applications that require live updates, such as chat applications or real-time analytics dashboards.

Setting Up the FastAPI Project

First, install the necessary packages:

pip install fastapi langchain pydantic uvicorn

Next, define the FastAPI router and Pydantic models to structure and validate incoming messages.

from fastapi import FastAPI, APIRouter
from pydantic import BaseModel
from typing import List

app = FastAPI()
router = APIRouter()

class Message(BaseModel):
    role: str
    content: str

class ChatPayload(BaseModel):
    messages: List[Message]

    class Config:
        schema_extra = {
            "example": {
                "messages": [{"role": "user", "content": "Who are you?"}]
            }
        }

Creating the Streaming API Endpoint

Create an endpoint to receive chat messages and stream responses back to the client using LangChain. This is done by emitting server-sent events (SSE).

from fastapi import Request
from fastapi.responses import StreamingResponse
from langchain_openai import ChatOpenAI
import json

@router.post("/api/completion")
async def stream(request: Request, payload: ChatPayload):
    chat = ChatOpenAI()
    return StreamingResponse(send_completion_events(payload.messages, chat=chat), media_type="text/event-stream")

async def send_completion_events(messages, chat):
    async for patch in chat.astream_log(messages):
        for op in patch.ops:
            if op["op"] == "add" and op["path"] == "/streamed_output/-":
                content = op["value"] if isinstance(op["value"], str) else op["value"].content
                json_dict = {"type": "llm_chunk", "content": content}
                json_str = json.dumps(json_dict)
                yield f"data: {json_str}\\n\\n"

app.include_router(router)

Running the FastAPI Application

Run the FastAPI application using Uvicorn, an ASGI server implementation for Python.

uvicorn main:app --reload

Navigate to http://127.0.0.1:8000/docs to see the interactive API documentation generated by FastAPI. This documentation provides an easy way to test the API endpoints.

Why JSON Patch?

LangChain's astream_log method uses JSON Patch to stream events, which is why understanding JSON Patch is essential for implementing this integration effectively. JSON Patch provides an efficient way to update parts of a JSON document incrementally without needing to send the entire document. This is particularly useful in real-time applications where data needs to be updated frequently and incrementally.

Brief Overview of JSON Patch

JSON Patch supports several operation types, including:

Add: Inserts a new value into the JSON document at the specified path.
Remove: Removes the value at the specified path.
Replace: Replaces the value at the specified path with a new value.
Move: Moves a value from one path to another in the document.
Copy: Copies a value from one path to another.
Test: Tests that a specified value at a specified path exists.

Consider the original document:

{
  "baz": "qux",
  "foo": "bar"
}

Applying the patch:

[
  { "op": "replace", "path": "/baz", "value": "boo" },
  { "op": "add", "path": "/hello", "value": ["world"] },
  { "op": "remove", "path": "/foo" }
]

Results in:

{
  "baz": "boo",
  "hello": ["world"]
}

JSON Patch allows for efficient, incremental updates, making it ideal for applications that require frequent or real-time updates to their data.

Conclusion

By integrating LangChain with FastAPI, developers can build efficient asynchronous streaming APIs capable of handling real-time data. This setup is ideal for applications like chatbots, where timely responses and data processing are crucial. FastAPI's ease of use and LangChain's abstraction capabilities, combined with the efficiency of JSON Patch, make this combination a powerful tool for modern web development.

Want to learn more about building Responsive LLMs? Check out my course on newline: Responsive LLM Applications with Server-Sent Events

I cover :

How to design systems for AI applications
How to stream the answer of a Large Language Model
Differences between Server-Sent Events and WebSockets
Importance of real-time for GenAI UI
How asynchronous programming in Python works
How to integrate LangChain with FastAPI
What problems Retrieval Augmented Generation can solve
How to create an AI agent ... and much more.

Worth checking out if you want to build your own LLM applications - in the course I will provide extensive code examples and help you go from concept to deployment.

Forem

Integrating LangChain with FastAPI for Asynchronous Streaming

Introduction to LangChain

Introduction to FastAPI

Combining LangChain with FastAPI

Setting Up the FastAPI Project

Creating the Streaming API Endpoint

Why JSON Patch?

Brief Overview of JSON Patch

Conclusion

Top comments (0)

Read next

AI Engineer's Tool Review: Unstructured

Making An LLM A Data Analysis Intern (Who Even Likes Reading Sustainability Reports!)

Build Tool Calling Agents with LangGraph and IBM watsonx.ai Flows Engine

Day 45: Interpretability Techniques for LLMs