David Mezzetti for NeuML

Posted on Nov 1, 2023 • Edited on Apr 25, 2024 • Originally published at neuml.hashnode.dev

Custom API Endpoints

#ai #llm #rag #vectordatabase

The txtai API is a web-based service backed by FastAPI. Semantic search, LLM orchestration and Language Model Workflows can all run through the API.

While the API is extremely flexible and complex logic can be executed through YAML-driven workflows, some may prefer to create an endpoint in Python.

This article introduces API extensions and shows how they can be used to define custom Python endpoints that interact with txtai applications.

Install dependencies

Install txtai and all dependencies.

# Install txtai
pip install txtai[api] datasets

Define the extension

First, we'll create an application that defines a persistent embeddings database and LLM. Then we'll combine those two into a RAG endpoint through the API.

The code below creates an API endpoint at /rag. This is a GET endpoint that takes a text parameter as input.

app.yml

# Embeddings index
writable: true
embeddings:
  hybrid: true
  content: true

# LLM pipeline
llm:
  path: google/flan-t5-large
  torch_dtype: torch.bfloat16

rag.py

from fastapi import APIRouter
from txtai.api import application, Extension


class RAG(Extension):
    """
    API extension
    """

    def __call__(self, app):
        app.include_router(RAGRouter().router)


class RAGRouter:
    """
    API router
    """

    router = APIRouter()

    @staticmethod
    @router.get("/rag")
    def rag(text: str):
        """
        Runs a retrieval augmented generation (RAG) pipeline.

        Args:
            text: input text

        Returns:
            response
        """

        # Run embeddings search
        results = application.get().search(text, 3)
        context = " ".join([x["text"] for x in results])

        prompt = f"""
        Answer the following question using only the context below.

        Question: {text}
        Context: {context}
        """

        return {
            "response": application.get().pipeline("llm", (prompt,))
        }

Start the API instance

Let's start the API with the RAG extension.

CONFIG=app.yml EXTENSIONS=rag.RAG nohup uvicorn "txtai.api:app" &> api.log &
sleep 60

Create the embeddings database

Next, we'll create the embeddings database using the ag_news dataset. This is a set of news stories from the mid 2000s.

from datasets import load_dataset
import requests

ds = load_dataset("ag_news", split="train")

# API endpoint
url = "http://localhost:8000"
headers = {"Content-Type": "application/json"}

# Add data
batch = []
for text in ds["text"]:
  batch.append({"text": text})
  if len(batch) == 4096:
    requests.post(f"{url}/add", headers=headers, json=batch, timeout=120)
    batch = []

if batch:
    requests.post(f"{url}/add", headers=headers, json=batch, timeout=120)

# Build index
index = requests.get(f"{url}/index")

Run queries

Now that we have a knowledge source indexed, let's run a set of queries. The code below defines a method that calls the /rag endpoint and retrieves the response. Keep in mind this dataset is from 2004.

While the Python Requests library is used in this article, this is a simple web endpoint that can be called from any programming language.

def rag(text):
    return requests.get(f"{url}/rag?text={text}").json()["response"]

rag("Who is the current President?")

'George W. Bush'

rag("Who lost the presidential election?")

'John Kerry'

rag("Who won the World Series?")

'Boston'

rag("Who did the Red Sox beat to win the world series?")

'Cardinals'

rag("What major hurricane hit the USA?")

'Charley'

rag("What mobile phone manufacturer has the largest current marketshare?")

'Nokia'

Wrapping up

This article showed how a txtai application can be extended with custom endpoints in Python. While applications have a robust workflow framework, it may be preferable to write complex logic in Python and this method enables that.

DEV Community

Custom API Endpoints

Install dependencies

Define the extension

Start the API instance

Create the embeddings database

Run queries

Wrapping up

Top comments (0)

Read next

Introducing Langflow.new: Frictionless AI

Deepseek R1 Locally | Top 5 Free Open-Source Tools | Ollama | Automation | RAG

Building an Intelligent Document Assistant System with Gemini 2.0 Flash and Agno Framework

How AI Systems Think: New Framework Reveals Machine Reasoning Through 'Thought Logging'