DEV Community

Cover image for Implementing RAG with Azure OpenAI in .NET (C#)
PeterMilovcik
PeterMilovcik

Posted on

Implementing RAG with Azure OpenAI in .NET (C#)

This tutorial was created using OpenAI's Deep Research capability.

Retrieval-Augmented Generation (RAG) combines a document retrieval step with an OpenAI LLM to ground the model’s answers on your data. Below are best practices for a quick yet robust prototype in C#, focusing on storage, embeddings, vector search, and resources.

Document Storage: Azure Blob vs. Local

Use Azure Blob Storage for realistic scenarios – Storing documents in Azure Blob Storage is the recommended approach, especially if you plan to integrate with Azure Cognitive Search. Azure’s RAG tutorials upload source files (e.g. PDFs) to a Blob container so an indexer can ingest and chunk them automatically (RAG tutorial: Build an indexing pipeline - Azure AI Search | Microsoft Learn). Blob Storage is scalable and allows Azure Search or other services to pull content easily.

Local storage for quick prototyping – For a simple prototype or local development, you can read files from the local filesystem. This avoids provisioning cloud storage initially. It’s acceptable to start with local files if you have just a few documents. Keep in mind you’ll likely transition to Blob Storage for production or if you want to use Azure Search indexers (which work natively with Azure Blob). In short, local storage is fine for an isolated proof-of-concept, but Azure Blob Storage is preferred for anything beyond a toy example or when using Azure’s managed pipelines.

Authentication: In a prototype, using the storage account’s API key or connection string is easiest for access. You can supply the Blob Storage key (or a SAS token) in your configuration to let your .NET code upload or download files. This is simpler than setting up Azure AD roles at this stage. (You can later move to managed identities for better security once the basic solution works.)

Embedding Model: Choosing the Best for Azure OpenAI

Use OpenAI’s text-embedding-ada-002 model – The ADA v2 text embedding model is currently the go-to for Azure OpenAI RAG scenarios. It produces 1536-dimensional embeddings and offers strong semantic representation of text. In fact, Azure’s AI Search integration expects an embedding model like text-embedding-ada-002 for vectorization (Azure OpenAI Embedding skill - Azure AI Search | Microsoft Learn). As of now, text-embedding-ada-002 is the primary available embedding model in Azure OpenAI (other newer embedding models like “text-embedding-3-large” are in preview or not yet generally available to all users (embedding model for RAG, OpenAI Studio - Microsoft Q&A)).

Why ADA-002? It’s proven to perform well on a variety of semantic search tasks and is the recommended default. Using this model ensures your document chunks and user queries are mapped to the same vector space for meaningful similarity matching (Azure OpenAI Embedding skill - Azure AI Search | Microsoft Learn). When you create your Azure OpenAI resource, deploy the ada-002 embedding model (e.g., as an “embeddings” deployment) so your .NET app can call it. For example, you’ll call the Azure OpenAI Embeddings API with this model to vectorize your document chunks and user questions.

Vector Database: Azure AI Search vs. pgvector vs. Others

Once you have embeddings, you need a vector store to index and query them for nearest matches:

  • Azure AI Search (Cognitive Search) – This is often the simplest and most integrated choice for an Azure-based RAG prototype. Azure Cognitive Search now supports vector search natively, and it’s a “proven solution” for RAG in enterprise scenarios (RAG and generative AI - Azure AI Search | Microsoft Learn). You can create a search index with a vector field to store embeddings, and use the Search service’s API to perform cosine similarity searches. The benefit is that Azure Search also provides robust indexing (including built-in chunking and enrichment skills) and security controls. If your data doesn’t change frequently in real-time, and you’re okay managing an index, Azure AI Search is compelling (Choose an Azure service for vector search - Azure Architecture Center | Microsoft Learn). It offloads a lot of the heavy lifting: for instance, you can set up an indexer that automatically chunks documents and calls the embedding model to vectorize content during ingestion (RAG tutorial: Build an indexing pipeline - Azure AI Search | Microsoft Learn).

  • PostgreSQL with pgvector – Using a relational database can be convenient if you prefer a lightweight setup or already have a database in your stack. PostgreSQL’s pgvector extension allows you to store embedding vectors and run similarity queries via SQL. This approach is straightforward: you insert your vectors into a table and use cosine (or inner product) distance functions provided by pgvector to find nearest neighbors. If your team is already comfortable with Postgres, leveraging it might be the easiest solution for your scenario (Choose an Azure service for vector search - Azure Architecture Center | Microsoft Learn). It’s a good option for quick prototypes because you can avoid learning a new service – just add the extension to an Azure Database for PostgreSQL or local Postgres, and you have a basic vector database ready. Keep in mind performance might be lower than a specialized vector store for very large vector counts, but for moderate volumes it works well.

  • FAISS or other libraries – FAISS (Facebook AI Similarity Search) is a high-performance library for vector similarity search. It’s typically used in Python or C++ environments, but there are .NET bindings and alternatives (like SciSharp’s MILVus or using Pinecone via API) if you want to explore them. For a quick C# prototype, introducing FAISS might add complexity unless you’re familiar with it. However, if you need an on-premise or in-memory solution without external services, you could integrate a vector search library. Many RAG prototypes skip a database entirely and just use an in-memory list of vectors with a brute-force cosine similarity for simplicity when the data size is very small – this is not scalable, but can jumpstart a proof-of-concept.

Recommendation: For an Azure-oriented prototype, Azure Cognitive Search is often the best blend of simplicity and capability. It’s managed, requires minimal code for indexing/querying, and ties in nicely with Azure OpenAI (for example, Azure’s “OpenAI on your data” feature can use an Azure Search index as the knowledge source). If you prefer not to provision Azure Search or want everything local, Postgres/pgvector is a solid fallback that keeps things simple while following best practices (vector indexing in a database). In contrast, FAISS or dedicated vector DBs (Pinecone, Weaviate, etc.) might be overkill for a quick demo unless you specifically want to evaluate them. The bottom line is that any vector store can work – RAG isn’t limited to one technology (AI Architecture Design - Azure Architecture Center | Microsoft Learn) – so choose the one that lets you iterate fastest while meeting your needs.

Authentication with API Keys (Azure OpenAI & Storage)

For a quick prototype, API key-based authentication is the way to go. Both Azure OpenAI and Azure Storage support API keys that you can use in your .NET application, avoiding the overhead of Azure AD authentication setup. Using the keys is straightforward and gets you up and running quickly. Microsoft’s documentation notes that while you can use Azure AD roles for tighter security, “keys are easier to start with” for development (Quickstart: Generative Search (RAG) - Azure AI Search | Microsoft Learn).

In practice, this means:

  • Azure OpenAI – Use the Key and Endpoint from your Azure OpenAI resource. For example, when calling the Azure OpenAI REST API (or SDK), set the api-key header with your key. In C#, if using the OpenAI client libraries or REST calls, ensure you include the key. (Azure OpenAI also requires an API version in the endpoint URL.)
  • Azure Storage (Blobs) – Use the storage account key or a SAS token to authenticate your Blob client in Azure SDK for .NET. For instance, you can use new BlobServiceClient(<connection_string>) where the connection string contains the key. This avoids needing to configure managed identities or OAuth in a prototype.

Using API keys for both services will let your prototype run with minimal friction. Just be sure to keep the keys safe (don’t hard-code them in public code; use something like user-secrets or environment variables in your .NET project). You can later swap to managed identities when moving toward production, but for now API keys are perfectly fine and conform to Azure’s recommended quickstart practices (Quickstart: Generative Search (RAG) - Azure AI Search | Microsoft Learn).

Official Microsoft Tutorials & Resources

Microsoft provides several tutorials and examples to guide you in building a RAG solution on Azure:

  • Azure Learn RAG Tutorial Series – The Azure Cognitive Search documentation has an end-to-end RAG tutorial series. For example, there’s a Quickstart: Generative search (RAG) guide that shows how to set up an Azure AI Search index and query it with Azure OpenAI. “In this quickstart, you send queries to a chat completion model for a conversational search experience over your indexed content on Azure AI Search” (Quickstart: Generative Search (RAG) - Azure AI Search | Microsoft Learn). It walks through setting up the services (OpenAI, Search) and using them together (the sample code is in Python, but the concepts apply equally to .NET). There are also Azure Architecture Center articles on designing RAG solutions, and a tutorial on building an indexing pipeline that covers chunking, embedding, and indexing documents (RAG tutorial: Build an indexing pipeline - Azure AI Search | Microsoft Learn).

  • “ChatGPT + Enterprise Data” C# Sample – Microsoft has an official .NET sample on GitHub (Azure-Samples) that demonstrates a full RAG application using Azure OpenAI and Azure Cognitive Search. “This sample… uses Azure OpenAI Service to access the ChatGPT model, and Azure AI Search for data indexing and retrieval.” (GitHub - Azure-Samples/azure-search-openai-demo-csharp: A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure Cognitive Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.). It’s a Blazor web app that allows you to ask questions to an internal knowledge base. The repo comes with scripts to set up Azure Blob Storage (for documents), an Azure Search index, and Azure OpenAI deployments. This is a great resource to see best practices in action (chunking docs, storing embeddings in the index, constructing prompts with citations, etc.) in a C# codebase. You can deploy it to Azure or run locally once you configure the keys.

  • Microsoft Learn Modules – Look for learning modules or workshops on “Azure OpenAI and Cognitive Search”. For instance, Azure AI Search’s documentation includes an overview of RAG and suggests approaches in different languages. They even mention templates for .NET that create an end-to-end solution (RAG and generative AI - Azure AI Search | Microsoft Learn). Additionally, the AI Show or Azure webinars often showcase building a chatbot with your own data on Azure. Checking Microsoft Learn for “OpenAI on your data” or “knowledge mining with Azure OpenAI” can yield step-by-step content.

By leveraging these resources, you can follow a proven path. Start by uploading a few documents to Blob (or using provided sample data), create a vector index (either via Azure Search or manually with pgvector), obtain embeddings with Ada-002, and then use the OpenAI GPT-35-Turbo or GPT-4 model to answer questions with the retrieved content. The tutorials will reinforce the best practices: store data securely, use efficient embedding models, query via vector similarity, and authenticate with API keys for simplicity.

Sources:


Photo by Pachon in Motion

Top comments (0)