Aymen K

Posted on Feb 7 • Edited on Feb 22 • Originally published at ainovae.hashnode.dev

Build a Local RAG Researcher with DeepSeek R1

#ai #rag #deepseek #agenticai

You’ve probably heard of DeepSeek R1, the powerful Chinese reasoning LLM that’s detouring OpenAI’s dominance in the AI world. And let’s be honest—you’ve likely seen tons of tutorials on chatting with PDFs using its local versions. But… that’s where most of them stop.

So, I thought—why not go beyond that? 🚀

What if you had a local RAG researcher that:

Questions your local documents 📄
Performs web searches 🌍
Generates structured reports in your format 📝

💡 See for yourself!

By the end of this guide, you’ll learn:

🔹 Why you need a local RAG researcher

🔹 How to use local DeepSeek R1 models with Ollama

🔹 How to run your own fully local RAG researcher

So… are you ready to level up your AI research game? Let’s dive in! ⏳💡

🔥 Explore the project in my Github repository now!

Do You Really Need a Local RAG Researcher? 🤔

You already know the perks of local LLMs—privacy, zero ongoing costs, offline access, and full control over the models. But why take it a step further with a local RAG researcher?

Let’s look at two simple cases:

📌 Your boss asks for a report on the company’s progress using internal documents.

📌 You want to analyze your personal finances and get a tailored report with insights.

In both cases, sending sensitive data to a cloud AI isn’t an option unless you want your finances leaked 💸 or your boss furious 😡.

Sure, you could chat with your documents using a local model, but you'd still waste time manually compiling a final report.

🔹 That's where your local RAG researcher comes in.

Just tell it what you need and how the report should be structured, and it will:

✅ Generate relevant research queries

✅ Retrieve key insights from your docs

✅ Perform web searches for real-time data (if enabled!)

✅ Summarize everything into a structured report

🚀 And it does all this entirely on your local machine.

Why Use DeepSeek R1? 🧠

When DeepSeek released their top-tier DeepSeek R1 model, they didn't stop at just one breakthrough. Instead, they took it a step further by developing a series of distilled models—smaller, open-source LLMs built on top of existing architectures like LLaMA and Qwen. These models were fine-tuned using reasoning datasets generated by the larger DeepSeek R1 model.

In other words, the big 671b parameters DeepSeek R1 model served as a teacher model, transferring its reasoning abilities to these smaller versions through carefully curated training datasets. This method allowed them to retain strong reasoning capabilities while being much more lightweight and efficient.💡

And the results? 🔥

📊 Just look at the table above—a trained LLaMA 70B model from DeepSeek already outperforms OpenAI’s o1-mini in reasoning tasks. That’s huge!

But what’s even more exciting?

If you don’t have a powerful GPU, these smaller models change everything. You can now run the DeepSeek-R1-14B model on your own machine and still achieve reasoning performance comparable to OpenAI’s O1-mini, while also being much more powerful than the latest GPT-4o and Claude Sonnet-3.5 models.

This means powerful, private, and cost-free reasoning models—all accessible from your local setup. 🚀

💡 The possibilities are endless, and they’re now in your hands.

Researcher Agent Architecture

(If you can't wait to test it out, skip to setup—we won't tell! 😉)

Before setting up the RAG researcher, let’s take a quick look at how it’s built 🔍.

Local ChromaDB Vector Store

As with any RAG application, we need a vector database for search. Once you upload your files (PDFs, TXT, MD, or CSV) from the UI, we follow these standard steps:

1. Load the Documents: We use LangChain loaders to handle different file formats easily.

# Import libraries
def process_uploaded_files(uploaded_files):
    ...

    # Choose the appropriate loader
    if file_extension == "csv":
        loader = CSVLoader(temp_file_path)
    elif file_extension in ["txt", "md"]:
        loader = TextLoader(temp_file_path)
    elif file_extension == "pdf":
        loader = PDFPlumberLoader(temp_file_path)
    else:
        continue
    ...

💡 Tip: Experimenting with different file types or chunking strategies? Just swap in a new loader or splitter logic to suit your needs!

2. Chunk the Documents: We split each document into smaller, semantically meaningful chunks.
3. Create the Vector Database: We use a local HuggingFace embedding model to convert document chunks into vector representations, then store them in the local Chroma database.

def add_documents(documents):
    embeddings = HuggingFaceEmbeddings()

    # Process the new documents
    semantic_text_splitter = SemanticChunker(embeddings)
    documents = semantic_text_splitter.split_documents(documents)

    # Create embeddings and store in vector DB
    vectorstore = Chroma(embedding_function=embeddings)
    vectorstore.add_documents(documents)

    return vectorstore

🔑 Key Point: Semantic chunking splits text based on meaning rather than just character count, improving accuracy when fetching relevant snippets later.

Researcher Agent

(Don’t worry—we’ll keep this quick. But trust me, you’ll want to see how it works. ✨)

Our researcher agent is an adaptive RAG pipeline built using LangGraph. Let’s break down how it transforms your instructions into polished reports.

Core Workflow

The agent functions as a state machine that manages research generation, document retrieval, and synthesis.

Step 1: Generate Search Queries

The agent starts by analyzing the provided instructions and generating precise research queries:

def generate_research_queries(...):  
    query_writer_prompt = RESEARCH_QUERY_WRITER_PROMPT.format(
        max_queries=max_queries,
        date=datetime.datetime.now().strftime("%Y/%m/%d %H:%M")
    )

    # Using a local DeepSeek R1 model with Ollama
    result = invoke_ollama(
        model='deepseek-r1:7b',
        system_prompt=query_writer_prompt,
        user_prompt=f"Generate research queries for this user instruction: {user_instructions}",
        output_format=Queries
    )

We provide the reasoning model with the current date and the maximum number of queries, letting it handle the rest:

RESEARCH_QUERY_WRITER_PROMPT = """You are an expert Research Query Writer specializing in designing precise and effective queries to fulfill user research tasks.

Your goal is to generate the necessary queries to complete the user's research goal based on their instructions. Ensure the queries are concise, relevant, and avoid redundancy.

Your output must be a JSON object containing a single key "queries":
{{ "queries": ["Query 1", "Query 2",...] }}

# NOTE:
* You can generate up to {max_queries} queries, but only as many as needed to effectively address the user's research goal.
* **Today is: {date}**
"""

💡 Tip: You can adjust the number of search queries directly from the UI.

Step 2: Process Queries

Once generated, queries are processed in parallel for speed. Each query runs through a mini-research subgraph:

query_search_subgraph.add_edge(START, "retrieve_rag_documents")
query_search_subgraph.add_edge("retrieve_rag_documents", "evaluate_retrieved_documents")
query_search_subgraph.add_conditional_edges("evaluate_retrieved_documents", route_research)
query_search_subgraph.add_edge("web_research", "summarize_query_research")
query_search_subgraph.add_edge("summarize_query_research", END)

Here’s how it works:

Retrieve Documents

Fetches the top 3 most relevant snippets from ChromaDB using similarity search.

def retrieve_rag_documents(...):  
    vectorstore_retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})  
    return {"retrieved_documents": documents}

Relevance Check

A secondary reasoning agent evaluates whether the retrieved documents actually answer the query (relevant or not).

def evaluate_retrieved_documents(...):  
    evaluation_prompt = RELEVANCE_EVALUATOR_PROMPT.format(...)
    evaluation = invoke_ollama(...)  # "Are these docs useful? Yes/No"  
    return {"are_documents_relevant": evaluation.is_relevant}

Adaptive Routing

✅ Relevant? → Move to summarization.
❌ Irrelevant? → Launch web search (if enabled).

😔 No web search? → Skip the query.

def route_research(...):  
    if state["are_documents_relevant"]:  
        return "summarize_query_research"  
    elif enable_web_search:  
        return "web_research"  
    else:
        return "__end__"

⚠️ Note: Web research uses the Tavily API to gather high-quality sources (academic papers, verified blogs).

Summarize Research Findings

A summarizer agent compiles a concise, focused summary based on retrieved documents or web search results for each query.

def summarize_query_research(...):  
     summary_prompt = SUMMARIZER_PROMPT.format(...)

     summary = invoke_ollama(
         model='deepseek-r1:7b',
         system_prompt=summary_prompt,
         user_prompt=f"Generate a research summary for this query: {query}"
     )
     return {"search_summaries": [summary]}

We instruct the summarizer agent to craft an objective summary of the key findings while skipping irrelevant facts:

 SUMMARIZER_PROMPT="""Your goal is to generate a focused, evidence-based research summary from the provided documents.

 KEY OBJECTIVES:
 1. Extract and synthesize critical findings from each source
 2. Present key data points and metrics that support main conclusions
 3. Identify emerging patterns and significant insights
 4. Structure information in a clear, logical flow

 REQUIREMENTS:
 - Begin immediately with key findings - no introductions
 - Focus on verifiable data and empirical evidence
 - Keep the summary brief, avoid repetition and unnecessary details
 - Prioritize information directly relevant to the query

 Query:
 {query}

 Retrieved Documents:
 {docmuents}
 """

Step 3: Generate Final Report

The researcher compiles all findings into a well-structured report following the user-defined format.

def generate_final_answer(...):  
    report_structure = config["configurable"].get("report_structure", "")
    answer_prompt = REPORT_WRITER_PROMPT.format(
        instruction=state["user_instructions"],
        report_structure=report_structure,
        information="\n\n---\n\n".join(state["search_summaries"])
    )
    result = invoke_ollama(
        model='deepseek-r1:7b',
        system_prompt=answer_prompt,
        user_prompt=f"Generate a research summary using the provided information."
    )
    return {"final_answer": parse_output(result)["response"]}

This is the prompt that the report writer agent must follow:

REPORT_WRITER_PROMPT = """Your goal is to use the provided information to write a comprehensive and accurate report that answers all the user's questions. 
The report must strictly follow the structure requested by the user.

USER INSTRUCTION:
{instruction}

REPORT STRUCTURE:
{report_structure}

PROVIDED INFORMATION:
{information}

# **CRITICAL GUIDELINES:**
- Adhere strictly to the structure specified in the user's instruction.
- Start IMMEDIATELY with the summary content - no introductions or meta-commentary
- Focus ONLY on factual, objective information
- Avoid redundancy, repetition, or unnecessary commentary.
"""

The crafted report appears in the UI, ready for you to review or copy.

Excited to see it in action? Let’s fire up the agent! 🔥

How to Run It Yourself?

Tech Stack Used

Ollama: Runs the DeepSeek R1 model locally.
LangGraph: Builds AI agents and defines the researcher's workflow.
ChromaDB: Local vector database for RAG-based retrieval.
Streamlit: Provides a UI for interacting with the researcher.

🛠 Step 1: Install Ollama

Ollama is available for macOS, Linux, and Windows. Follow these steps to install it:

1️⃣ Visit the official Ollama download page.

🔗 Download Ollama

2️⃣ Select your operating system (macOS, Linux, or Windows).

3️⃣ Click the Download button.

4️⃣ Follow the system-specific installation instructions.

📸 Screenshot:

🛠 Step 2: Run DeepSeek R1 Locally

Once Ollama is installed, you can run DeepSeek R1 models. There are multiple options depending on your system’s capabilities.

For this researcher agent, I used the 7B model, but you can choose the 1.5B model or even larger ones if your machine can handle them!

🔹 Pull the DeepSeek R1 Model

To download the 1.5B parameter model, run:

ollama pull deepseek-r1:1.5b

🔹 Run DeepSeek R1

Once downloaded, you can interact with the model using:

ollama run deepseek-r1:1.5b

🔑 Key Point: Larger models (e.g., 32B, 70B) provide better reasoning and research quality but require significant RAM.

🛠 Step 3: Set Up the RAG Researcher with Streamlit

1️⃣ Clone the Project

To run the researcher agent, clone the GitHub repository and install the required dependencies (preferably in a virtual environment):

git clone https://github.com/kaymen99/local-rag-researcher-deepseek
cd local-rag-researcher-deepseek
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt

2️⃣ Set Up Web Search API (Optional)

(Skip this if you don't want to use web search.)

The researcher agent uses the Tavily API for web search. You’ll need an API key (they give a lot of free credit—basically FREE). Add it to your .env file:

# Tavily API key for SearchTool
TAVILY_API_KEY="tvly-..."

3️⃣ Launch the Streamlit App

I've already built a Streamlit UI for the researcher agent (check app.py). It allows you to configure the agent, upload documents, and see real-time updates.

To launch the app, run:

streamlit run app.py

📸 Screenshot:

🔹 Visualize in LangGraph Studio

Since the researcher is built with LangGraph, you can visualize its workflow in LangGraph Studio. (You'll need a free LangSmith account for this)

pip install -U "langgraph-cli[inmem]"
langgraph dev

📸 Screenshot:

🚀 Test It Out!

Upload your documents and provide research instructions—the agent will handle the rest!

Real-World Use Cases 🌍

A local RAG researcher with reasoning power isn’t just a cool project—it’s a game-changer for handling private, sensitive data. Here’s how it can be used in real scenarios:

🏥 Healthcare & Medical Research: Analyze patient files, summarize research papers, and generate reports—all while keeping data secure.
🔐 Cybersecurity & Threat Analysis: Process vulnerability reports, security logs, and risk assessments without exposing sensitive information.
💰 Financial Insights & Business Reports: Review personal or company financials, uncover spending patterns, and generate structured reports privately.
🚀 Product Development & Innovation: Summarize internal R&D docs, analyze customer feedback, and generate insights for new product strategies.
🏢 Corporate Management: Retrieve and summarize internal reports, meeting notes, and technical documents for faster decision-making.

And the best part? Everything stays local. No cloud processing, no data leaks—just secure, AI-powered research. 🚀

Customizations 🛠️

🔹 Add Custom Report Structures

By default, the researcher uses a "standard" report structure, but you can customize it.

Add new structures inside the report_structures folder in the project root directory.
Select your preferred structure from the UI.

🔹 Use external LLM Providers

By default, the researcher uses Ollama and the local DeepSeek R1 models, but you can easily switch to external LLM providers like OpenAI, Claude, or even the larger Cloud-based DeepSeek R1 models.

To change providers:

1️⃣ Uncomment the relevant code in assistant/graph.py.

2️⃣ Add the necessary API keys to your .env file.

Final Thoughts

✅ You’ve successfully set up your local RAG researcher with DeepSeek R1 & Ollama!

✅ Your AI researcher is up and running—ready to analyze private data and generate your structured reports!

✅ Built with LangGraph, it’s fully customizable—add a reflection step, connect it to databases, or extend its capabilities to fit your needs!

💡 Want to learn more? Follow my blog and check out my Github for more AI project & tutorials!

📖 Also, don’t miss my guide to reasoning LLMs! Check out Understanding Reasoning LLMs: DeepSeek R1 & OpenAI o1 Demystified to learn how these models really work!

Top comments (2)

Vinayak Mishra • Feb 12

Very well explained!! You can also have a look at this blog if it helps - Advanced RAG techniques

Aymen K • Feb 13

Thanks! I will check it out

DEV Community

Build a Local RAG Researcher with DeepSeek R1

Do You Really Need a Local RAG Researcher? 🤔

Why Use DeepSeek R1? 🧠

Researcher Agent Architecture

Local ChromaDB Vector Store

Researcher Agent

Core Workflow

Step 1: Generate Search Queries

Step 2: Process Queries

Step 3: Generate Final Report

How to Run It Yourself?

Tech Stack Used

🛠 Step 1: Install Ollama

🛠 Step 2: Run DeepSeek R1 Locally

🛠 Step 3: Set Up the RAG Researcher with Streamlit

1️⃣ Clone the Project

2️⃣ Set Up Web Search API (Optional)

3️⃣ Launch the Streamlit App

🚀 Test It Out!

Real-World Use Cases 🌍

Customizations 🛠️

Final Thoughts

Top comments (2)

Read next

The Ultimate Guide to Local DeepSeek Deployment

How to make an API call in the middle of an OpenAI conversation

How AI-Powered Code Review Can Save Your Development Team Hours

Automating Kubernetes Cost Optimization with AI: The Next Frontier in DevOps