Tao Christopher Takahashi

Posted on Mar 9

Optimizing a RAG-Based Helpdesk Chatbot: Improving Accuracy with pgvector

#ai #rag #vectordatabase

Introduction

In my previous post, I covered how I implemented pgvector in a RAG (Retrieval-Augmented Generation) system for a helpdesk chatbot. While the system performed well, accuracy issues arose:

Some retrieved helpdesk articles weren’t fully relevant
The chatbot sometimes misinterpreted the query context
Long answers from the retrieved content confused GPT

To fix these, I optimized the retrieval process, improved embedding quality, and refined GPT’s response generation. In this post, I’ll walk through the optimizations that improved the model’s accuracy.

Key Optimizations for Higher Accuracy

To improve accuracy, I focused on three key areas:

1️⃣ Enhanced Query Preprocessing: Cleaning and reformulating user queries

2️⃣ Better Retrieval Strategies: Improving pgvector search results

3️⃣ Refining Response Generation: Giving GPT a structured context

1. Enhancing Query Preprocessing

User queries are often ambiguous, unstructured, or too short. For example:

User input: "VPN issue" → Too vague
Better reformulated query: "How to fix VPN connection issues on Windows?"

Improvements in Query Preprocessing

✅ Synonym Expansion: Expanding user queries with relevant synonyms

✅ Query Normalization: Lowercasing, removing special characters

✅ Prompt Expansion: Reformulating short queries into full questions

Example: Query Expansion Using NLP

To improve query quality, I used Natural Language Processing (NLP) techniques for synonym-based expansion using WordNet. This helps broaden the search scope and retrieve more relevant documents in a RAG-based system.

⚠ Experimental Feature: This API is not stable and may undergo significant changes. Use it with caution in production environments.

Optimized TypeScript Implementation

I used Node.js + TypeScript with the natural library, which provides a WordNet interface.

import natural from "natural";

const wordnet = new natural.WordNet();

async function expandQuery(query: string): Promise<string> {
  const words = query.split(" ");
  const expandedWords: string[] = [];

  for (const word of words) {
    const synonyms = await getSynonyms(word);
    expandedWords.push(synonyms.length > 0 ? synonyms[0] : word); // Use first synonym if available
  }

  return expandedWords.join(" ");
}

function getSynonyms(word: string): Promise<string[]> {
  return new Promise((resolve) => {
    wordnet.lookup(word, (results) => {
      if (results.length > 0) {
        resolve(results[0].synonyms);
      } else {
        resolve([]);
      }
    });
  });
}

How It Works

Splits the user’s query into individual words.
Fetches synonyms from WordNet for each word.
Replaces words with the first available synonym (if found).
Reconstructs the modified query and returns it.

Example Output

🔹 User Query: "VPN problem"

🔹 Expanded Query: "VPN issue error troubleshooting"

By expanding queries, the chatbot can retrieve more relevant helpdesk documents, improving retrieval accuracy in a pgvector-powered RAG system. 🚀

2. Improving Retrieval Strategies in pgvector

After preprocessing queries, the next challenge was improving retrieval precision.

Issue: Irrelevant Results

The pgvector search sometimes returned loosely related articles, reducing accuracy.

Optimization: Hybrid Search (Vector + Keyword)

To improve precision, I combined:

Vector search (pgvector): Finds semantically similar content
Keyword filtering (SQL LIKE/FULL TEXT SEARCH): Ensures relevance

SELECT id, title, content, embedding <=> $1 AS distance
FROM helpdesk_articles
WHERE title ILIKE '%' || $2 || '%'
ORDER BY distance
LIMIT 3;

🔹 How It Helps:

Vector search ranks results by semantic similarity
Keyword matching filters out irrelevant content

Indexing for Faster Search

To speed up retrieval, I added an IVFFLAT index in pgvector:

CREATE INDEX ON helpdesk_articles USING ivfflat (embedding vector_l2_ops);

🔹 Result: Faster, more precise document retrieval.

read about IVFFLAT index

3. Refining GPT’s Response Generation

Even after improving retrieval, GPT sometimes misunderstood context.

Issue: Poorly Structured Responses

GPT occasionally misinterpreted retrieved documents, leading to long-winded or vague answers.

Optimization: Structured Context for GPT

Instead of feeding raw helpdesk documents, I formatted them before passing to GPT.

async function generateStructuredResponse(userQuery: string) {
  const relevantDocs = await searchHelpdesk(userQuery);

  // Structure the retrieved data for GPT
  const structuredContext = relevantDocs.map(doc => 
    `Title: ${doc.title}\nSummary: ${summarizeText(doc.content)}`
  ).join("\n\n");

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      { role: "system", content: "You are a support agent. Answer using provided helpdesk articles." },
      { role: "user", content: `User query: ${userQuery}` },
      { role: "assistant", content: `Relevant Articles:\n${structuredContext}` }
    ],
  });

  return response.choices[0].message.content;
}

🔹 Why This Works:

Shorter, more structured inputs → GPT understands better
Summarized content reduces noise → GPT focuses on key points

Summarizing Long Helpdesk Articles

When a retrieved document was too long, I summarized it before passing to GPT using OpenAI’s gpt-4-turbo.

async function summarizeText(text: string) {
  const response = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [
      { role: "system", content: "Summarize the following helpdesk article into key points." },
      { role: "user", content: text }
    ],
  });

  return response.choices[0].message.content;
}

🔹 Example:

Before:

"To reset your password, go to settings. Click 'Forgot Password'. Enter your email and follow the instructions sent to your email."

After Summarization:

"Go to Settings > Forgot Password > Follow email instructions."

This compressed content, keeping the essential information for accurate GPT responses.

Final Results: Accuracy Improvements

After applying these optimizations:

📌 Before Optimizations:

🔴 GPT sometimes generated irrelevant answers
🔴 Retrieved documents were not always the best match
🔴 Long documents confused GPT, leading to vague responses

📌 After Optimizations:

✅ More relevant search results using hybrid search

✅ Shorter, well-structured GPT inputs → Clearer, more concise responses

✅ 50% reduction in GPT hallucinations

Example Before vs. After Optimization

User Query:

"How do I reset my password?"

📉 Before Optimization:

"Resetting passwords requires authentication. If you have trouble logging in, try changing your credentials in the account settings."

📈 After Optimization:

"Go to Settings > Forgot Password. Follow the email instructions to reset your password."

Accuracy improved significantly! 🚀

Key Takeaways

✅ Best Practices for Optimizing RAG Accuracy

✔ Preprocess user queries: Expand, clean, and normalize input

✔ Improve retrieval with hybrid search: Combine vector + keyword search

✔ Index embeddings efficiently: Use IVFFLAT indexing for fast lookups

✔ Summarize long documents: Shorter context improves GPT’s response quality

✔ Structure inputs to GPT: Provide a clear and concise format

Final Thoughts

Integrating pgvector with RAG was a game-changer for helpdesk chatbots, but improving accuracy required deeper optimization. By enhancing retrieval, refining GPT input, and handling long documents better, I significantly improved response precision.

If you’ve worked with RAG + pgvector, I’d love to hear your thoughts! Drop your experiences in the comments. 💬

DEV Community