PeterMilovcik

Posted on Mar 10

Building a .NET Console App for Document Search (RAG) with OpenAI Embeddings

#openai #rag #tutorial #dotnet

This tutorial was created using OpenAI's Deep Research capability.

In this step-by-step tutorial, we’ll build a .NET (C#) console application that lets users upload text documents (like PDFs or .txt files), generates embeddings for their content using OpenAI’s text-embedding-ada-002 model, stores those embeddings in a local index (simulating a database), and implements a simple chat loop to query the documents. The application will retrieve relevant document sections in response to user questions – a basic example of Retrieval-Augmented Generation (RAG). We’ll use the official OpenAI .NET SDK and standard libraries, with detailed explanations of each step.

Prerequisites

.NET 6 (or higher) installed on your machine (the tutorial uses .NET 6/7 syntax).
A C# development environment (Visual Studio, VS Code, or the .NET CLI).
An OpenAI API Key with access to the embeddings endpoint. You can get one by logging into the OpenAI platform and creating a new secret key ( NuGet Gallery | OpenAI 2.1.0 ). (Keep your API key private!)
Some sample text documents to test (e.g., a .txt file or a PDF with text content).

Setup Summary

Create a new .NET console project.
Install the OpenAI .NET SDK via NuGet.
Prepare code to load and chunk documents (for PDFs, text extraction is needed).
Use OpenAI’s text-embedding-ada-002 model to generate embeddings for each document chunk.
Store embeddings in a simple local database or in-memory structure.
Implement a console chat loop: for each query, generate the query embedding, perform a vector similarity search among stored embeddings, and return the best matching document references.
Run the app and ask questions about your documents!

Along the way, we’ll explain core concepts like RAG, embeddings, and vector similarity search, and provide suggestions for enhancements (using vector databases, caching, etc.).

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with generative AI models. In RAG, a large language model (LLM) is not limited to its built-in training knowledge; instead, it retrieves relevant data from an external knowledge base (documents, databases, etc.) to augment its responses (Retrieval-augmented generation - Wikipedia) (What is RAG? - Retrieval-Augmented Generation AI Explained - AWS). In practice, this means when a user asks a question, the system will first fetch relevant text from your documents and feed that into the LLM so that the answer can include up-to-date or domain-specific information. This helps the model give more accurate and context-specific answers (and reduces hallucinations, where the model might otherwise invent unsupported answers).

(RAG 101: Demystifying Retrieval-Augmented Generation Pipelines | NVIDIA Technical Blog) High-level RAG workflow: Documents (e.g., PDFs, text files) are ingested and embedded (converted to vectors) and stored in a vector index. At query time, the user’s question is also embedded and used to retrieve semantically relevant document chunks from the index. These retrieved pieces of text are then given to the LLM to generate a final answer.

In our console app, we will implement a simplified RAG pipeline:

Document Ingestion: reading user-provided files, preprocessing them (e.g., converting PDF to text), and splitting into chunks if they’re large.
Embedding Generation: calling OpenAI’s embedding model to get a numerical vector for each chunk of text. Each vector represents the semantic meaning of that text.
Storage: saving those vectors (and references to the original text) in a local list or database for quick search.
Query: in a loop, accept user questions, embed the question, search for the most similar document vectors (this is the vector search step), and return the matching document content or references as the answer.

This approach augments the assistant (our console app) with knowledge from your documents, without retraining any model. Next, let’s briefly cover what embeddings are and how vector search works, since these are core to our implementation.

What are Embeddings and Vector Search?

Embeddings are numerical representations of text. In simpler terms, an embedding is a list of floating-point numbers (a vector) that encodes the semantic meaning of a piece of text (OpenAI Embeddings :: Spring AI Reference). Texts with similar meaning will have embeddings that are closer together in this vector space. For example, the sentences “How to start a car” and “Methods for car ignition” would yield vectors that are more similar to each other than to the vector for “Recipe for apple pie.” OpenAI’s text-embedding-ada-002 model produces a 1536-dimensional vector for each input text, where each dimension is a float. These high-dimensional vectors capture nuanced semantic relationships ( NuGet Gallery | OpenAI 2.1.0 ).

Vector Search (Semantic Search) is the process of finding which vectors in a collection are most similar to a given query vector. Rather than keyword matching, this finds text that means the same thing. A common way to measure similarity between two vectors is cosine similarity, which measures the angle between the vectors (Embeddings giving incorrect results - API - OpenAI Developer Community). Two identical vectors have a cosine similarity of 1 (meaning they point in the same direction), whereas unrelated vectors have a similarity closer to 0. In practice, to answer a question we will:

Compute the embedding of the user’s query (a query vector).
Calculate the similarity between this query vector and each stored document vector (e.g., using cosine similarity).
Pick the top K most similar document vectors – those likely containing relevant info.
Return those document snippets or their references as the result.

By using embeddings and vector search, our application can find relevant information even if the question uses different wording than the documents. Now, let’s get into the implementation steps.

Step 1: Create a New .NET Console Project

First, set up a new console application. You can use the .NET CLI or Visual Studio:

Using .NET CLI: Open a terminal/command prompt in your desired folder and run:
```
dotnet new console -n DocumentSearchRAG
```
This creates a new console app in a folder named DocumentSearchRAG. You can replace the name as you like.
Using Visual Studio: Create a new Console App project and give it a name (e.g., DocumentSearchRAG).

Once created, open the project. We’ll be writing our code in the Program.cs (for top-level statements or the Main method) for simplicity.

Step 2: Install the OpenAI .NET SDK

To interact with OpenAI’s API, we’ll use the official OpenAI .NET SDK (NuGet package OpenAI). Install this package:

Via CLI: run the following from the project directory:
```
dotnet add package OpenAI
```
This adds the OpenAI NuGet package to your project.
Via Visual Studio: open the NuGet Package Manager, search for “OpenAI” by OpenAI, Inc., and install it.

After installation, you should see the OpenAI package in your project dependencies. Now you have access to OpenAI’s clients in C#.

OpenAI API Key Configuration: Ensure your OpenAI API key is available to the app. The SDK will need it to authenticate with the API. It’s best not to hard-code the key. Instead, store it in an environment variable or a secure store. For example, you can set an environment variable OPENAI_API_KEY with your key (on Windows, use System Properties or setx; on Linux/Mac, use export OPENAI_API_KEY=<your key>). The official docs recommend using an environment variable to avoid exposing the key in code ( NuGet Gallery | OpenAI 2.1.0 ).

Step 3: Loading and Processing Documents

Next, we’ll allow the user to provide documents for the knowledge base. Our console app will prompt the user for file paths to load. We will handle text files directly and (briefly) discuss PDFs.

Document Loading Strategy:

Text files (.txt): We can read the full text easily using standard .NET I/O.
PDF files: .NET doesn’t have a built-in PDF reader in the standard libraries, so extracting text from PDFs requires an external library or tool (e.g., PdfPig or iText7). For simplicity, in this tutorial we will assume any PDF is already in text form or skip PDF content extraction. (In a real app, you could use a library like PdfPig to get the text (Chat with your documents using OpenAI embeddings in .NET/C# - crispycode.net) and then split by pages or paragraphs (Chat with your documents using OpenAI embeddings in .NET/C# - crispycode.net).)

We will also split documents into smaller chunks if they are large. Chunking by paragraph or a fixed size helps in two ways: it keeps each chunk within the token limit of the embedding model (which is around 8191 tokens for text-embedding-ada-002), and it allows more fine-grained retrieval (so the user gets a specific part of the document as the answer, not the whole document). Here, we’ll do a simple split by paragraphs for demonstration.

Let’s write the code to load documents and prepare their text segments:

using OpenAI.Embeddings;
using System.Text.RegularExpressions;

string apiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY") 
                ?? throw new Exception("Please set the OPENAI_API_KEY environment variable.");

// Initialize the OpenAI Embedding client for text-embedding-ada-002
EmbeddingClient embedClient = new EmbeddingClient("text-embedding-ada-002", apiKey);

// Data structure to store our document chunks and embeddings
record DocumentChunk(string Source, string Text, float[] EmbeddingVector);
List<DocumentChunk> knowledgeBase = new List<DocumentChunk>();

// Function to split text into chunks by empty line (paragraphs) or a max length
IEnumerable<string> SplitIntoChunks(string text, int maxChars = 1000)
{
    // Split by two newlines as paragraph separators
    var paragraphs = Regex.Split(text, @"\r?\n\s*\r?\n");
    foreach (var para in paragraphs)
    {
        if (string.IsNullOrWhiteSpace(para)) continue;
        string trimmed = para.Trim();
        // If paragraph is very long, further split by sentence or maxChars
        if (trimmed.Length > maxChars)
        {
            // Split by period for simplicity (could be smarter)
            string[] sentences = trimmed.Split('.', StringSplitOptions.RemoveEmptyEntries);
            string chunk = "";
            foreach (var sentence in sentences)
            {
                if (chunk.Length + sentence.Length < maxChars)
                {
                    chunk += sentence + ". ";
                }
                else
                {
                    yield return chunk;
                    chunk = sentence + ". ";
                }
            }
            if (!string.IsNullOrWhiteSpace(chunk))
                yield return chunk;
        }
        else
        {
            yield return trimmed;
        }
    }
}

// Prompt user to input document file paths
Console.WriteLine("Enter the path of a text document to upload (or press Enter to finish):");
string? path;
while (!string.IsNullOrEmpty(path = Console.ReadLine()))
{
    if (!System.IO.File.Exists(path))
    {
        Console.WriteLine($"File not found: {path}");
    }
    else
    {
        string ext = System.IO.Path.GetExtension(path).ToLower();
        string content = "";

        if (ext == ".txt")
        {
            content = System.IO.File.ReadAllText(path);
        }
        else if (ext == ".pdf")
        {
            Console.WriteLine("PDF support not implemented in this example. Please provide a text file.");
            // In a real scenario, use a PDF library to extract text here.
            content = "";
        }
        else
        {
            Console.WriteLine("Unsupported file type. Please provide a .txt or .pdf file.");
            content = "";
        }

        if (!string.IsNullOrWhiteSpace(content))
        {
            // Split file content into chunks
            int chunkIndex = 0;
            foreach (string chunk in SplitIntoChunks(content))
            {
                try 
                {
                    // Generate embedding for the chunk
                    OpenAIEmbedding embedding = embedClient.GenerateEmbedding(chunk);
                    float[] vector = embedding.ToFloats().ToArray();  // convert to float array

                    // Create a record for this chunk with source file name and chunk text
                    knowledgeBase.Add(new DocumentChunk(System.IO.Path.GetFileName(path) + $"#chunk{chunkIndex}", chunk, vector));
                    chunkIndex++;
                }
                catch (Exception ex)
                {
                    Console.WriteLine($"Error embedding chunk from {path}: {ex.Message}");
                }
            }

            Console.WriteLine($"Loaded {path} and created {knowledgeBase.Count} chunks.");
        }
    }
    Console.WriteLine("Enter another file path (or press Enter to finish):");
}

Code Explanation:

We retrieve the API key from the environment and initialize an EmbeddingClient for the text-embedding-ada-002 model. This client will be used to generate embeddings. If the API key isn’t set, we throw an exception to remind the user.
We define a record DocumentChunk to hold each chunk’s source (e.g. filename or document id), the text content, and the embedding vector.
The knowledgeBase list will act as our in-memory database of vectors.
The SplitIntoChunks function uses a regex to split text by blank lines (assuming those separate paragraphs). It also ensures no chunk exceeds a certain length (maxChars). If a paragraph is too long, we further split it by sentences (this is a simple strategy; one could also split by tokens or fixed character counts). This ensures we don’t feed extremely large text into a single embedding call and that our chunks are of manageable size.
We then prompt the user in a loop to enter file paths. For each path:
- Check existence; if not found, notify and continue.
- If it’s a .txt file, read all text. If it’s a .pdf, we currently skip actual parsing and alert the user (in a real app, you’d use a PDF text extraction here). We ignore other file types.
- If content is obtained, we call SplitIntoChunks to break it into smaller pieces. For each chunk, we call embedClient.GenerateEmbedding(chunk) to get an OpenAIEmbedding object. We convert that to a float array (via ToFloats().ToArray()).
- We add a new DocumentChunk to our knowledgeBase, identifying it by the file name plus a chunk index (e.g., “report.pdf#chunk0”, “report.pdf#chunk1”, etc.), along with the text and the embedding vector.
- Any exceptions from the API call are caught and logged (for example, if the chunk is too large or network issues occur).
We continue prompting for more files until the user hits Enter on an empty line, then exit the loop. After this, our knowledgeBase contains all the document embeddings ready for search.

At this point, we have processed the documents. Each document’s text has been embedded into a vector and stored in memory. In a real application, you might save these vectors to a database or file so that you don’t have to regenerate them each time the app runs. For example, you could store them in an SQLite database or a JSON file. But for our tutorial, an in-memory list suffices.

Step 4: Implementing Vector Similarity Search

With document embeddings in hand, we need a way to find which chunks are most relevant to a user’s query. We will implement a simple cosine similarity function to compare the query vector with each stored vector. Cosine similarity ranges from -1 to 1, where 1 means the vectors are identical in direction (very similar in meaning) and 0 means they are orthogonal (unrelated) (Embeddings giving incorrect results - API - OpenAI Developer Community). We’ll retrieve the top matches (e.g., top 3) for each query.

Let’s add a helper to compute cosine similarity between two vectors, and then the chat loop:

// Helper: Compute cosine similarity between two float vectors
float CosineSimilarity(float[] vec1, float[] vec2)
{
    if (vec1.Length != vec2.Length) throw new Exception("Vectors must be same length");
    double dot = 0.0;
    double norm1 = 0.0;
    double norm2 = 0.0;
    for (int i = 0; i < vec1.Length; i++)
    {
        dot += vec1[i] * vec2[i];
        norm1 += vec1[i] * vec1[i];
        norm2 += vec2[i] * vec2[i];
    }
    return (float)(dot / (Math.Sqrt(norm1) * Math.Sqrt(norm2)));
}

// Start chat loop for queries
Console.WriteLine("\nDocument upload complete. You can now ask questions about the documents.");
Console.WriteLine("Type your question (or 'exit' to quit):");

string? query;
while (!string.IsNullOrEmpty(query = Console.ReadLine()))
{
    if (query.Equals("exit", StringComparison.OrdinalIgnoreCase))
        break;
    // Generate embedding for the user query
    OpenAIEmbedding queryEmbedding;
    try
    {
        queryEmbedding = embedClient.GenerateEmbedding(query);
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error generating embedding for query: {ex.Message}");
        continue;
    }
    float[] queryVector = queryEmbedding.ToFloats().ToArray();

    // Find the most similar document chunks
    int topK = 3;
    var topMatches = knowledgeBase
        .Select(chunk => new 
        { 
            Chunk = chunk, 
            Score = CosineSimilarity(queryVector, chunk.EmbeddingVector) 
        })
        .OrderByDescending(x => x.Score)
        .Take(topK)
        .Where(x => x.Score > 0.0f)  // only consider if similarity > 0
        .ToList();

    if (topMatches.Count == 0)
    {
        Console.WriteLine("No relevant documents found for your query.");
    }
    else
    {
        Console.WriteLine("Relevant documents:");
        foreach (var match in topMatches)
        {
            Console.WriteLine($"- {match.Chunk.Source} (Similarity: {match.Score:F2})");
            // Optionally, print a snippet of the text:
            string snippet = match.Chunk.Text;
            if (snippet.Length > 200) snippet = snippet.Substring(0, 200) + "...";
            Console.WriteLine($"  Excerpt: \"{snippet}\"");
        }
    }
    Console.WriteLine("\nAsk another question (or 'exit' to quit):");
}

Code Explanation:

We defined CosineSimilarity(float[] vec1, float[] vec2) to compute the similarity between two vectors. It calculates the dot product of the two vectors and divides by the product of their magnitudes (Euclidean norms) (Embeddings giving incorrect results - API - OpenAI Developer Community). This gives a value between -1 and 1. (All our embeddings are non-negative values, so in practice similarities will be 0 to 1 range.)
We then start a loop to interact with the user. The user can type a question; if they type “exit”, we break out of the loop and end the program.
For each query:
- We use embedClient.GenerateEmbedding(query) to get the embedding vector for the question (just like we did for documents).
- We then iterate over all stored knowledgeBase chunks and compute the cosine similarity between the query vector and each chunk’s vector.
- We take the top K results (here topK = 3) by sorting in descending order of similarity. These are the most semantically relevant chunks to the question.
- If we found any matches (i.e., the list isn’t empty), we print them out. For each match, we display the source (file name and chunk) and the similarity score (formatted to two decimals for readability). We also print an excerpt of the chunk’s text as a preview. (The code limits the excerpt to first 200 characters for neatness.)
- If no matches were found (which could happen if the knowledge base is empty or query is unrelated), we notify the user that no relevant docs were found.
- The loop then prompts for another question, until the user exits.

This completes the core functionality. The console app can load documents, build embeddings, and answer queries by retrieving relevant document references and excerpts.

Step 5: Run and Test the Application

Compile and run the application (e.g., using dotnet run in the project directory or by running in your IDE).

Uploading Documents: When the app starts, it will ask for a document path. Enter the path to a .txt file (for example, C:\docs\sample.txt). If you want to test PDFs, you should first convert them to text or integrate a PDF text extraction library. You can enter multiple files one by one. After you finish entering files, just hit Enter on an empty line.

The app will then embed the documents. You should see output like:

Loaded sample.txt and created 5 chunks.
Enter another file path (or press Enter to finish):

Once you finish adding files, the app will say you can ask questions.

Querying: Now type in a question related to the content of the documents you provided. For example, if one of your documents was about a company’s policies, you might ask: “What is the leave policy for vacations?” The program will generate the embedding for your question, compare it against all document embeddings, and print out the top matches:

Relevant documents:
- CompanyPolicy.txt#chunk3 (Similarity: 0.88)
  Excerpt: "…employees are entitled to 15 days of paid vacation leave per year…"
- CompanyPolicy.txt#chunk1 (Similarity: 0.85)
  Excerpt: "…Our leave policy states that requests for vacation must be submitted…"

Each result shows which document (and chunk) is relevant and a short excerpt. This way, you know where the answer is coming from. You can then open that document or even extend the program to display more of the content.

Try asking a few questions to see how it retrieves different chunks. If the results seem off, remember that embeddings capture semantic meaning, so phrasing matters – try rewording the query if needed. Also, if your documents were not loaded or chunked properly, the answers may be incomplete.

Core Concepts Recap

OpenAI Embeddings: We used text-embedding-ada-002, an OpenAI model that converts text into a 1536-dimensional vector representation. The SDK call GenerateEmbedding(text) returns this vector. These embeddings let us measure semantic similarity between texts (OpenAI Embeddings :: Spring AI Reference).
Vector Similarity (Cosine): To find relevant info, we don’t do keyword search. Instead, we compute cosine similarity between the query vector and each document vector to find the closest matches (most similar meaning) (Embeddings giving incorrect results - API - OpenAI Developer Community).
Retrieval-Augmented Generation: Our app demonstrates the “retrieval” part by fetching text chunks related to the query. In a full RAG system, those chunks would be fed into an LLM (like GPT-4) to generate a comprehensive answer that cites or uses them. Even without generating new text, our app gives the user the relevant source material for their query, which is the main goal of a Q&A system on custom data (Retrieval-augmented generation - Wikipedia).

Suggestions for Enhancements

This basic implementation can be improved and extended in many ways:

Use a Vector Database: For larger document sets, storing embeddings in a real vector database (like Pinecone, Weaviate, Milvus, etc.) would make similarity search faster and more scalable (Retrieval Augmented Generation). You could use a local solution like SQLite with the vector extension or an in-memory ANN (approximate nearest neighbors) library for better performance on millions of vectors.
Caching Embeddings: If your document set doesn’t change often, save the embeddings to disk (in a file or database) after the first run. This way, you don’t have to recompute them every time you start the app. Similarly, cache query embeddings if you expect repeat questions.
PDF and File Type Support: Integrate libraries to handle PDFs (e.g., PdfPig or iTextSharp) and other formats (Word docs, HTML). This would involve extracting text from those formats during ingestion (Chat with your documents using OpenAI embeddings in .NET/C# - crispycode.net). You might also store metadata like page numbers or section titles to provide more context in the response.
Smarter Chunking: The simple paragraph split may not be optimal. Consider splitting by sentences or semantic boundaries and using overlapping windows (so important info split between paragraphs isn’t missed). Libraries like LangChain or LlamaIndex (if using Python) implement advanced chunking strategies, and similar logic can be adopted in C#.
Interactive Chat Improvements: Our console loop is basic. You could enhance it by integrating an actual OpenAI ChatCompletion call to have the model generate a natural language answer from the retrieved text. For example, send a prompt to GPT-3.5 including the top 3 excerpts as context and the user’s question – then have the model formulate an answer and cite the document names. This would fully demonstrate RAG (retrieval + generation). Just be mindful of token limits when constructing the prompt.
User Interface: Eventually, you may want a friendlier interface. This could be a simple ASP.NET Core web application or a GUI that allows users to upload files and ask questions without using the console.
Error Handling and Logging: Add more robust error handling, and possibly logging, especially for the embedding API calls and file I/O. This will help in diagnosing issues (like API rate limits or file parse errors).
Use of External Libraries: While we stuck to the OpenAI SDK and core libraries, there are .NET libraries and frameworks for working with AI. For instance, Microsoft’s Semantic Kernel or Libraries like Microsoft.Extensions.AI can streamline some of these steps (like caching and chunking). Exploring these could reduce the amount of boilerplate code you write.

By implementing these enhancements, you can create a more production-ready solution. But even with the basic version from this tutorial, you have a working demonstration of using OpenAI embeddings for semantic search over custom documents – a powerful technique to augment AI applications with private knowledge.

References:

Retrieval-Augmented Generation concept (Retrieval-augmented generation - Wikipedia),
OpenAI embeddings and relatedness (OpenAI Embeddings :: Spring AI Reference) (Embeddings giving incorrect results - API - OpenAI Developer Community),
PDF chunking strategy (Chat with your documents using OpenAI embeddings in .NET/C# - crispycode.net).

Photo by Pachon in Motion

DEV Community

Building a .NET Console App for Document Search (RAG) with OpenAI Embeddings

Prerequisites

Setup Summary

What is Retrieval-Augmented Generation (RAG)?

What are Embeddings and Vector Search?

Step 1: Create a New .NET Console Project

Step 2: Install the OpenAI .NET SDK

Step 3: Loading and Processing Documents

Step 4: Implementing Vector Similarity Search

Step 5: Run and Test the Application

Core Concepts Recap

Suggestions for Enhancements

Top comments (0)

Read next

Chain of Draft: Thinking Faster by Writing Less

Building Ardour from Source on Linux: A Comprehensive Guide

Building a YouTube-Style Loading Bar in Vue 2: A Complete Guide

GPT-4.5: An Honest (and Slightly Amusing) Review