Introduction
In my previous post, I covered how I implemented pgvector in a RAG (Retrieval-Augmented Generation) system for a helpdesk chatbot. While the system performed well, accuracy issues arose:
- Some retrieved helpdesk articles weren’t fully relevant
- The chatbot sometimes misinterpreted the query context
- Long answers from the retrieved content confused GPT
To fix these, I optimized the retrieval process, improved embedding quality, and refined GPT’s response generation. In this post, I’ll walk through the optimizations that improved the model’s accuracy.
Key Optimizations for Higher Accuracy
To improve accuracy, I focused on three key areas:
1️⃣ Enhanced Query Preprocessing: Cleaning and reformulating user queries
2️⃣ Better Retrieval Strategies: Improving pgvector search results
3️⃣ Refining Response Generation: Giving GPT a structured context
1. Enhancing Query Preprocessing
User queries are often ambiguous, unstructured, or too short. For example:
- User input: "VPN issue" → Too vague
- Better reformulated query: "How to fix VPN connection issues on Windows?"
Improvements in Query Preprocessing
✅ Synonym Expansion: Expanding user queries with relevant synonyms
✅ Query Normalization: Lowercasing, removing special characters
✅ Prompt Expansion: Reformulating short queries into full questions
Example: Query Expansion Using NLP
To improve query quality, I used Natural Language Processing (NLP) techniques for synonym-based expansion using WordNet. This helps broaden the search scope and retrieve more relevant documents in a RAG-based system.
⚠ Experimental Feature: This API is not stable and may undergo significant changes. Use it with caution in production environments.
Optimized TypeScript Implementation
I used Node.js + TypeScript with the natural
library, which provides a WordNet interface.
import natural from "natural";
const wordnet = new natural.WordNet();
async function expandQuery(query: string): Promise<string> {
const words = query.split(" ");
const expandedWords: string[] = [];
for (const word of words) {
const synonyms = await getSynonyms(word);
expandedWords.push(synonyms.length > 0 ? synonyms[0] : word); // Use first synonym if available
}
return expandedWords.join(" ");
}
function getSynonyms(word: string): Promise<string[]> {
return new Promise((resolve) => {
wordnet.lookup(word, (results) => {
if (results.length > 0) {
resolve(results[0].synonyms);
} else {
resolve([]);
}
});
});
}
How It Works
- Splits the user’s query into individual words.
- Fetches synonyms from WordNet for each word.
- Replaces words with the first available synonym (if found).
- Reconstructs the modified query and returns it.
Example Output
🔹 User Query: "VPN problem"
🔹 Expanded Query: "VPN issue error troubleshooting"
By expanding queries, the chatbot can retrieve more relevant helpdesk documents, improving retrieval accuracy in a pgvector-powered RAG system. 🚀
2. Improving Retrieval Strategies in pgvector
After preprocessing queries, the next challenge was improving retrieval precision.
Issue: Irrelevant Results
The pgvector search sometimes returned loosely related articles, reducing accuracy.
Optimization: Hybrid Search (Vector + Keyword)
To improve precision, I combined:
- Vector search (pgvector): Finds semantically similar content
- Keyword filtering (SQL LIKE/FULL TEXT SEARCH): Ensures relevance
SELECT id, title, content, embedding <=> $1 AS distance
FROM helpdesk_articles
WHERE title ILIKE '%' || $2 || '%'
ORDER BY distance
LIMIT 3;
🔹 How It Helps:
- Vector search ranks results by semantic similarity
- Keyword matching filters out irrelevant content
Indexing for Faster Search
To speed up retrieval, I added an IVFFLAT index in pgvector:
CREATE INDEX ON helpdesk_articles USING ivfflat (embedding vector_l2_ops);
🔹 Result: Faster, more precise document retrieval.
3. Refining GPT’s Response Generation
Even after improving retrieval, GPT sometimes misunderstood context.
Issue: Poorly Structured Responses
GPT occasionally misinterpreted retrieved documents, leading to long-winded or vague answers.
Optimization: Structured Context for GPT
Instead of feeding raw helpdesk documents, I formatted them before passing to GPT.
async function generateStructuredResponse(userQuery: string) {
const relevantDocs = await searchHelpdesk(userQuery);
// Structure the retrieved data for GPT
const structuredContext = relevantDocs.map(doc =>
`Title: ${doc.title}\nSummary: ${summarizeText(doc.content)}`
).join("\n\n");
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: "You are a support agent. Answer using provided helpdesk articles." },
{ role: "user", content: `User query: ${userQuery}` },
{ role: "assistant", content: `Relevant Articles:\n${structuredContext}` }
],
});
return response.choices[0].message.content;
}
🔹 Why This Works:
- Shorter, more structured inputs → GPT understands better
- Summarized content reduces noise → GPT focuses on key points
Summarizing Long Helpdesk Articles
When a retrieved document was too long, I summarized it before passing to GPT using OpenAI’s gpt-4-turbo
.
async function summarizeText(text: string) {
const response = await openai.chat.completions.create({
model: "gpt-4-turbo",
messages: [
{ role: "system", content: "Summarize the following helpdesk article into key points." },
{ role: "user", content: text }
],
});
return response.choices[0].message.content;
}
🔹 Example:
Before:
"To reset your password, go to settings. Click 'Forgot Password'. Enter your email and follow the instructions sent to your email."
After Summarization:
"Go to Settings > Forgot Password > Follow email instructions."
This compressed content, keeping the essential information for accurate GPT responses.
Final Results: Accuracy Improvements
After applying these optimizations:
📌 Before Optimizations:
- 🔴 GPT sometimes generated irrelevant answers
- 🔴 Retrieved documents were not always the best match
- 🔴 Long documents confused GPT, leading to vague responses
📌 After Optimizations:
✅ More relevant search results using hybrid search
✅ Shorter, well-structured GPT inputs → Clearer, more concise responses
✅ 50% reduction in GPT hallucinations
Example Before vs. After Optimization
User Query:
"How do I reset my password?"
📉 Before Optimization:
"Resetting passwords requires authentication. If you have trouble logging in, try changing your credentials in the account settings."
📈 After Optimization:
"Go to Settings > Forgot Password. Follow the email instructions to reset your password."
Accuracy improved significantly! 🚀
Key Takeaways
✅ Best Practices for Optimizing RAG Accuracy
✔ Preprocess user queries: Expand, clean, and normalize input
✔ Improve retrieval with hybrid search: Combine vector + keyword search
✔ Index embeddings efficiently: Use IVFFLAT indexing for fast lookups
✔ Summarize long documents: Shorter context improves GPT’s response quality
✔ Structure inputs to GPT: Provide a clear and concise format
Final Thoughts
Integrating pgvector with RAG was a game-changer for helpdesk chatbots, but improving accuracy required deeper optimization. By enhancing retrieval, refining GPT input, and handling long documents better, I significantly improved response precision.
If you’ve worked with RAG + pgvector, I’d love to hear your thoughts! Drop your experiences in the comments. 💬
Top comments (0)