You’ve probably heard of DeepSeek R1, the powerful Chinese reasoning LLM that’s detouring OpenAI’s dominance in the AI world. And let’s be honest—you’ve likely seen tons of tutorials on chatting with PDFs using its local versions. But… that’s where most of them stop.
So, I thought—why not go beyond that? 🚀
What if you had a local RAG researcher that:
Questions your local documents 📄
Performs web searches 🌍
Generates structured reports in your format 📝
💡 See for yourself!
By the end of this guide, you’ll learn:
🔹 Why you need a local RAG researcher
🔹 How to use local DeepSeek R1 models with Ollama
🔹 How to run your own fully local RAG researcher
So… are you ready to level up your AI research game? Let’s dive in! ⏳💡
🔥 Explore the project in my Github repository now!
Do You Really Need a Local RAG Researcher? 🤔
You already know the perks of local LLMs—privacy, zero ongoing costs, offline access, and full control over the models. But why take it a step further with a local RAG researcher?
Let’s look at two simple cases:
📌 Your boss asks for a report on the company’s progress using internal documents.
📌 You want to analyze your personal finances and get a tailored report with insights.
In both cases, sending sensitive data to a cloud AI isn’t an option unless you want your finances leaked 💸 or your boss furious 😡.
Sure, you could chat with your documents using a local model, but you'd still waste time manually compiling a final report.
🔹 That's where your local RAG researcher comes in.
Just tell it what you need and how the report should be structured, and it will:
✅ Generate relevant research queries
✅ Retrieve key insights from your docs
✅ Perform web searches for real-time data (if enabled!)
✅ Summarize everything into a structured report
🚀 And it does all this entirely on your local machine.
Why Use DeepSeek R1? 🧠
When DeepSeek released their top-tier DeepSeek R1 model, they didn't stop at just one breakthrough. Instead, they took it a step further by developing a series of distilled models—smaller, open-source LLMs built on top of existing architectures like LLaMA and Qwen. These models were fine-tuned using reasoning datasets generated by the larger DeepSeek R1 model.
In other words, the big 671b parameters DeepSeek R1 model served as a teacher model, transferring its reasoning abilities to these smaller versions through carefully curated training datasets. This method allowed them to retain strong reasoning capabilities while being much more lightweight and efficient.💡
And the results? 🔥
📊 Just look at the table above—a trained LLaMA 70B model from DeepSeek already outperforms OpenAI’s o1-mini in reasoning tasks. That’s huge!
But what’s even more exciting?
If you don’t have a powerful GPU, these smaller models change everything. You can now run the DeepSeek-R1-14B model on your own machine and still achieve reasoning performance comparable to OpenAI’s O1-mini, while also being much more powerful than the latest GPT-4o and Claude Sonnet-3.5 models.
This means powerful, private, and cost-free reasoning models—all accessible from your local setup. 🚀
💡 The possibilities are endless, and they’re now in your hands.
Researcher Agent Architecture
(If you can't wait to test it out, skip to setup—we won't tell! 😉)
Before setting up the RAG researcher, let’s take a quick look at how it’s built 🔍.
Local ChromaDB Vector Store
As with any RAG application, we need a vector database for search. Once you upload your files (PDFs, TXT, MD, or CSV) from the UI, we follow these standard steps:
- 1. Load the Documents: We use LangChain loaders to handle different file formats easily.
# Import libraries
def process_uploaded_files(uploaded_files):
...
# Choose the appropriate loader
if file_extension == "csv":
loader = CSVLoader(temp_file_path)
elif file_extension in ["txt", "md"]:
loader = TextLoader(temp_file_path)
elif file_extension == "pdf":
loader = PDFPlumberLoader(temp_file_path)
else:
continue
...
💡 Tip: Experimenting with different file types or chunking strategies? Just swap in a new loader or splitter logic to suit your needs!
2. Chunk the Documents: We split each document into smaller, semantically meaningful chunks.
3. Create the Vector Database: We use a local HuggingFace embedding model to convert document chunks into vector representations, then store them in the local Chroma database.
def add_documents(documents):
embeddings = HuggingFaceEmbeddings()
# Process the new documents
semantic_text_splitter = SemanticChunker(embeddings)
documents = semantic_text_splitter.split_documents(documents)
# Create embeddings and store in vector DB
vectorstore = Chroma(embedding_function=embeddings)
vectorstore.add_documents(documents)
return vectorstore
🔑 Key Point: Semantic chunking splits text based on meaning rather than just character count, improving accuracy when fetching relevant snippets later.
Researcher Agent
(Don’t worry—we’ll keep this quick. But trust me, you’ll want to see how it works. ✨)
Our researcher agent is an adaptive RAG pipeline built using LangGraph. Let’s break down how it transforms your instructions into polished reports.
Core Workflow
The agent functions as a state machine that manages research generation, document retrieval, and synthesis.
Step 1: Generate Search Queries
The agent starts by analyzing the provided instructions and generating precise research queries:
def generate_research_queries(...):
query_writer_prompt = RESEARCH_QUERY_WRITER_PROMPT.format(
max_queries=max_queries,
date=datetime.datetime.now().strftime("%Y/%m/%d %H:%M")
)
# Using a local DeepSeek R1 model with Ollama
result = invoke_ollama(
model='deepseek-r1:7b',
system_prompt=query_writer_prompt,
user_prompt=f"Generate research queries for this user instruction: {user_instructions}",
output_format=Queries
)
We provide the reasoning model with the current date and the maximum number of queries, letting it handle the rest:
RESEARCH_QUERY_WRITER_PROMPT = """You are an expert Research Query Writer specializing in designing precise and effective queries to fulfill user research tasks.
Your goal is to generate the necessary queries to complete the user's research goal based on their instructions. Ensure the queries are concise, relevant, and avoid redundancy.
Your output must be a JSON object containing a single key "queries":
{{ "queries": ["Query 1", "Query 2",...] }}
# NOTE:
* You can generate up to {max_queries} queries, but only as many as needed to effectively address the user's research goal.
* **Today is: {date}**
"""
💡 Tip: You can adjust the number of search queries directly from the UI.
Step 2: Process Queries
Once generated, queries are processed in parallel for speed. Each query runs through a mini-research subgraph:
query_search_subgraph.add_edge(START, "retrieve_rag_documents")
query_search_subgraph.add_edge("retrieve_rag_documents", "evaluate_retrieved_documents")
query_search_subgraph.add_conditional_edges("evaluate_retrieved_documents", route_research)
query_search_subgraph.add_edge("web_research", "summarize_query_research")
query_search_subgraph.add_edge("summarize_query_research", END)
Here’s how it works:
-
Retrieve Documents
Fetches the top 3 most relevant snippets from ChromaDB using similarity search.
def retrieve_rag_documents(...): vectorstore_retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3}) return {"retrieved_documents": documents}
-
Relevance Check
A secondary reasoning agent evaluates whether the retrieved documents actually answer the query (relevant or not).
def evaluate_retrieved_documents(...): evaluation_prompt = RELEVANCE_EVALUATOR_PROMPT.format(...) evaluation = invoke_ollama(...) # "Are these docs useful? Yes/No" return {"are_documents_relevant": evaluation.is_relevant}
Adaptive Routing
✅ Relevant? → Move to summarization.
❌ Irrelevant? → Launch web search (if enabled).
-
😔 No web search? → Skip the query.
def route_research(...): if state["are_documents_relevant"]: return "summarize_query_research" elif enable_web_search: return "web_research" else: return "__end__"
⚠️ Note: Web research uses the Tavily API to gather high-quality sources (academic papers, verified blogs).
-
Summarize Research Findings
A summarizer agent compiles a concise, focused summary based on retrieved documents or web search results for each query.
def summarize_query_research(...): summary_prompt = SUMMARIZER_PROMPT.format(...) summary = invoke_ollama( model='deepseek-r1:7b', system_prompt=summary_prompt, user_prompt=f"Generate a research summary for this query: {query}" ) return {"search_summaries": [summary]}
We instruct the summarizer agent to craft an objective summary of the key findings while skipping irrelevant facts:
SUMMARIZER_PROMPT="""Your goal is to generate a focused, evidence-based research summary from the provided documents. KEY OBJECTIVES: 1. Extract and synthesize critical findings from each source 2. Present key data points and metrics that support main conclusions 3. Identify emerging patterns and significant insights 4. Structure information in a clear, logical flow REQUIREMENTS: - Begin immediately with key findings - no introductions - Focus on verifiable data and empirical evidence - Keep the summary brief, avoid repetition and unnecessary details - Prioritize information directly relevant to the query Query: {query} Retrieved Documents: {docmuents} """
Step 3: Generate Final Report
The researcher compiles all findings into a well-structured report following the user-defined format.
def generate_final_answer(...):
report_structure = config["configurable"].get("report_structure", "")
answer_prompt = REPORT_WRITER_PROMPT.format(
instruction=state["user_instructions"],
report_structure=report_structure,
information="\n\n---\n\n".join(state["search_summaries"])
)
result = invoke_ollama(
model='deepseek-r1:7b',
system_prompt=answer_prompt,
user_prompt=f"Generate a research summary using the provided information."
)
return {"final_answer": parse_output(result)["response"]}
This is the prompt that the report writer agent must follow:
REPORT_WRITER_PROMPT = """Your goal is to use the provided information to write a comprehensive and accurate report that answers all the user's questions.
The report must strictly follow the structure requested by the user.
USER INSTRUCTION:
{instruction}
REPORT STRUCTURE:
{report_structure}
PROVIDED INFORMATION:
{information}
# **CRITICAL GUIDELINES:**
- Adhere strictly to the structure specified in the user's instruction.
- Start IMMEDIATELY with the summary content - no introductions or meta-commentary
- Focus ONLY on factual, objective information
- Avoid redundancy, repetition, or unnecessary commentary.
"""
The crafted report appears in the UI, ready for you to review or copy.
Excited to see it in action? Let’s fire up the agent! 🔥
How to Run It Yourself?
Tech Stack Used
Ollama: Runs the DeepSeek R1 model locally.
LangGraph: Builds AI agents and defines the researcher's workflow.
ChromaDB: Local vector database for RAG-based retrieval.
Streamlit: Provides a UI for interacting with the researcher.
🛠 Step 1: Install Ollama
Ollama is available for macOS, Linux, and Windows. Follow these steps to install it:
1️⃣ Visit the official Ollama download page.
2️⃣ Select your operating system (macOS, Linux, or Windows).
3️⃣ Click the Download button.
4️⃣ Follow the system-specific installation instructions.
📸 Screenshot:
🛠 Step 2: Run DeepSeek R1 Locally
Once Ollama is installed, you can run DeepSeek R1 models. There are multiple options depending on your system’s capabilities.
For this researcher agent, I used the 7B model, but you can choose the 1.5B model or even larger ones if your machine can handle them!
🔹 Pull the DeepSeek R1 Model
To download the 1.5B parameter model, run:
ollama pull deepseek-r1:1.5b
🔹 Run DeepSeek R1
Once downloaded, you can interact with the model using:
ollama run deepseek-r1:1.5b
🔑 Key Point: Larger models (e.g., 32B, 70B) provide better reasoning and research quality but require significant RAM.
🛠 Step 3: Set Up the RAG Researcher with Streamlit
1️⃣ Clone the Project
To run the researcher agent, clone the GitHub repository and install the required dependencies (preferably in a virtual environment):
git clone https://github.com/kaymen99/local-rag-researcher-deepseek
cd local-rag-researcher-deepseek
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
2️⃣ Set Up Web Search API (Optional)
(Skip this if you don't want to use web search.)
The researcher agent uses the Tavily API for web search. You’ll need an API key (they give a lot of free credit—basically FREE). Add it to your .env file:
# Tavily API key for SearchTool
TAVILY_API_KEY="tvly-..."
3️⃣ Launch the Streamlit App
I've already built a Streamlit UI for the researcher agent (check app.py
). It allows you to configure the agent, upload documents, and see real-time updates.
To launch the app, run:
streamlit run app.py
📸 Screenshot:
🔹 Visualize in LangGraph Studio
Since the researcher is built with LangGraph, you can visualize its workflow in LangGraph Studio. (You'll need a free LangSmith account for this)
pip install -U "langgraph-cli[inmem]"
langgraph dev
📸 Screenshot:
🚀 Test It Out!
Upload your documents and provide research instructions—the agent will handle the rest!
Customizations 🛠️
🔹 Add Custom Report Structures
By default, the researcher uses a "standard" report structure, but you can customize it.
Add new structures inside the
report_structures
folder in the project root directory.Select your preferred structure from the UI.
🔹 Swap LLM Providers
By default, the researcher uses Ollama and the local DeepSeek R1 models, but you can easily switch to external LLM providers like OpenAI, Claude, or even the larger Cloud-based DeepSeek R1 models.
To change providers:
1️⃣ Uncomment the relevant code in assistant/graph.py
.
2️⃣ Add the necessary API keys to your .env file.
Final Thoughts
✅ You’ve successfully set up DeepSeek R1 locally with Ollama!
✅ You’ve created a powerful RAG researcher tailored to your needs!
✅ Start exploring—upload documents, define report formats, and let your agent do the work!
💡 Want more? Follow my Dev blog and check out my Github for more AI project & tutorials! 🚀
📖 Also, don’t miss my guide to reasoning LLMs! Check out DeepSeek R1 & OpenAI o3/o1: Your Guide to Reasoning LLMs to learn how these models really work!
Top comments (0)