DEV Community

Cover image for DeepSeek Always Busy? Deploy It Locally with Milvus in Just 10 Minutes—No More Waiting!
Chloe Williams for Zilliz

Posted on • Originally published at zilliz.com

DeepSeek Always Busy? Deploy It Locally with Milvus in Just 10 Minutes—No More Waiting!

If you’ve tried using DeepSeek-R1 and encountered the message, “Server is busy. Please try again later,” you know how disruptive it can be. Waiting on busy servers can interrupt your workflow, especially when you need quick answers or are focused on a task.

A practical way to avoid this is by running DeepSeek-R1 directly on your own machine. This allows you to bypass server delays and have more control over how and when you use the model. In this guide, we’ll walk through setting up DeepSeek-R1 locally using a few tools that work together. Ollama will help you download and run the model on your system. We’ll then use AnythingLLM to make interacting with DeepSeek-R1 easier through a simple interface. Finally, we’ll integrate Milvus, a vector database, so the model can reference external data, like your own documents or custom information when answering questions.

You don’t need top-tier hardware to get started. DeepSeek-R1 offers smaller versions of the model that can run on more common setups. Whether you’re using DeepSeek-R1 for work, research, or personal projects, setting it up locally will help you avoid server issues and tailor the model to your needs. Let’s begin by setting up DeepSeek-R1 with Ollama.

Deploying DeepSeek-R1 with Ollama: Installation and Setup

Begin by installing Ollama, which simplifies the process of running AI models on your local machine. Start by visiting the Ollama download page and selecting the installer that corresponds to your operating system.

Figure: Ollama download page

Once the download is complete, run it and follow the on-screen prompts to finish the setup. After installation, confirm that Ollama is correctly set up by opening your command-line interface and typing:


ollama --version

Enter fullscreen mode Exit fullscreen mode

If everything is installed properly, you’ll see the version number displayed. This check ensures that your environment is ready to manage AI models locally.

Next, download DeepSeek-R1. For most users, the 7 billion-parameter (7B) version strikes a practical balance between performance and resource needs, generally requiring a GPU with around 18 GB of VRAM. If your hardware is less capable, the 1.5B model (about 3.9 GB of VRAM) is available. For advanced setups, you can consider the full 671B model, though it demands much higher resources. Adjust the command below according to the model size you want:


ollama pull deepseek-r1:7b

Enter fullscreen mode Exit fullscreen mode

The above command will download the specified model to your computer as shown below:

Figure: Installation of deepseek-r1:7b via Ollama to a computer

Once the model download is complete, launch DeepSeek-R1 by running:


ollama run deepseek-r1:7b

Enter fullscreen mode Exit fullscreen mode

Replace 7b with 1.5b or another version if you chose a different model. After the model starts, you can begin interacting with it by typing your prompts directly into the command line.

Figure: A deepseek-r1:7b sample session where a user asks a query and the model gives a response

The screenshot above shows a sample session where we ask, Who are you? and DeepSeek-R1 responds with an introduction, confirming that the model is running and ready to assist.

Installing and Configuring AnythingLLM

While interacting through the command line works for basic testing, it can be cumbersome for ongoing use. AnythingLLM provides a chat-style interface that makes conversations with your locally running models more intuitive. It also supports multiple language model backends, offers a straightforward way to upload custom data, and can integrate with vector databases like Milvus. Below is an overview of its features and how to install and connect it to Ollama.

Why Use AnythingLLM?

Some of the features are:

  • Interactive Chat: AnythingLLM replaces manual command-line prompts with a chat window, making it easier to see your conversation flow and reference past queries.

  • Centralized Model Management: You can run different language models under one interface and switch between them without juggling multiple terminals.

  • Data Integration: Built-in support for vector databases (like Milvus) lets you upload documents, research files, or other resources so the model can reference them when answering questions.

  • Easy Embeddings Configuration: Choose which embedding model to use for converting your documents and user queries into vectors, helping the system find relevant information more accurately.

These features provide a smoother workflow for anyone who needs more than basic terminal interactions. Let’s see how we can install and configure AnythingLLM:

Step 1: Download and Install AnythingLLM

Go to the AnythingLLM website and choose the installer that matches your operating system. After the file is downloaded, run it and follow the prompts to complete the setup. Once installed, open AnythingLLM, and you’ll be greeted by a Get Started page or wizard, indicating that the application is ready for its initial configuration.

Step 2: Skipping Initial Prompts

When you first open AnythingLLM, you’ll see prompts for LLM Preference, Data Handling & Privacy, and an optional Survey. For now, choose Skip on each screen so we can demonstrate how to access and configure these options within the app itself. You won’t see these prompts every time you launch AnythingLLM, so learning where to find them later will help you adjust settings or update preferences as needed.

Step 3: Creating Your First Workspace

After skipping the initial prompts, you’ll be asked to create your first workspace.

Choose a name for your workspace and then click the right arrow to proceed. We have named ours Milvus-DeepSeek-Local-Deploy. This workspace provides a dedicated area for organizing conversations and storing any documents you upload. Once created, you’ll see the chat interface:

This is where you’ll interact with DeepSeek-R1 after we connect the model.

Step 4: Connecting DeepSeek-R1 to Your Workspace

To connect your model, click the Settings button next to your workspace name. In the panel that opens, select Chat Settings. You’ll see options to choose a Workspace LLM Provider and set a Workspace Chat model.

Since we’re using Ollama, pick Ollama from the list of providers. Then, choose any model you’ve downloaded through Ollama by clicking Workspace Chat model. This includes DeepSeek-R1 (for example, in our case, deepseek-r1:7b).

Figure: Chat Settings Panel Showing Ollama as the LLM Provider and deepseek-r1:7b as the selected model

Once you select a model, save your changes. Your workspace is now connected to the chosen model, letting you interact with it directly through the chat interface. Go back to your chat interface to start chatting.

Step 5: Testing the Model with a Query

Now that your workspace is connected to DeepSeek-R1, you can test how it responds to questions it hasn’t been trained on. Try asking, Who is Chris Churilo? and note that the model doesn’t provide a clear answer.

Figure: LLM’s response to “Who is Chris Churilo?”

Because the model lacks information about Chris Churilo, it can’t give a detailed response. This gap illustrates the need for an external knowledge source. Milvus, a vector database, can store relevant data in the form of embeddings about Chris Churilo so the LLM can reference them.

Setting Up Milvus and Integrating It with AnythingLLM

Before linking your workspace to external data, you'll need to deploy a vector database that can store embeddings (numerical representations of data (like words or images) in a continuous vector space, capturing their meanings or features so that similar items are positioned closer together) from your custom data. Let’s see how to install Milvus and then configure AnythingLLM to use it as the vector database, enabling the model to give more informed answers.

Installing Milvus

First, ensure that Docker and Docker Compose are installed on your machine. Then, open your terminal and download the Milvus standalone Docker Compose file by running:


wget https://github.com/milvus-io/milvus/releases/download/v2.5.4/milvus-standalone-docker-compose.yml -O docker-compose.yml

Enter fullscreen mode Exit fullscreen mode

Next, open the downloaded docker-compose.yml file in your preferred text editor. Locate the configuration settings under the standalone service and update the COMMON_USER and COMMON_PASSWORD fields with your desired credentials.


environment:

  ETCD_ENDPOINTS: etcd:2379

  MINIO_ADDRESS: minio:9000

  COMMON_USER: milvus

  COMMON_PASSWORD: milvus

Enter fullscreen mode Exit fullscreen mode

We will use these credentials later when configuring AnythingLLM to connect to Milvus. After saving your changes, return to the terminal and start Milvus by executing the following command:


docker-compose up -d

Enter fullscreen mode Exit fullscreen mode

This command launches Milvus in the background. You can verify that Milvus is running by checking your Docker containers with docker ps or by visiting its health endpoint if configured.

Figure: Starting Milvus using docker compose on the command line

Now that Milvus is installed, we’re ready to integrate it with AnythingLLM, allowing your workspace to leverage this external data source for improved model responses.

Integrating Milvus with AnythingLLM

Now that Milvus is running, let’s configure AnythingLLM to store and retrieve embeddings for your custom data. Follow the steps below to enable more context-driven responses from your DeepSeek-R1 model.

1. Open the Settings

Click the Open Settings button at the bottom-left corner of the AnythingLLM interface. This will bring up a panel with several configuration categories.

Figure: AnythingLLM main interface with the Open Settings button highlighted

2. Select the Vector Database

Within the settings panel, expand AI Providers and select Vector Database. Under Vector Database Provider, choose Milvus. Fields will appear for the address and credentials you configured in the Milvus Docker Compose file:

  • Milvus DB Address: http://localhost:19530

  • Milvus Username: (the username you set)

  • Milvus Password: (the password you set)

Click Save to confirm your settings.

Figure: Vector Database settings in AnythingLLM with Milvus selected

3. Choose an Embedder (Optional)

Still under AI Providers, you can pick a different embedding model if desired. If no selection is made, AnythingLLM uses its default embedder. This step is optional for basic usage but can be useful if you want to optimize how text is vectorized. For this case, we will use the default option.

Figure: Embedding model selection in AnythingLLM, if applicable

4. Upload Documents

Return to the main interface or your workspace view. Upload any documents you want the model to reference by dragging and dropping them, or by selecting them through the interface. AnythingLLM will list your files, ready for embedding.

Figure: Document upload area showing newly added files

5. Move Documents to the Workspace

Once your documents appear, select each file and choose Move to Workspace. In the right panel, click Save and Embed. AnythingLLM converts the documents into vector embeddings and stores them in Milvus, making the content accessible to DeepSeek-R1.

Figure: Moving documents to a workspace and embedding them

After completing these steps, your DeepSeek-R1 model can reference the embedded documents. For instance, if you now ask, Who is Chris Churilo?. The model will know who she is as you can see below:

Figure: DeepseekR1 thinking how to answer a query Who is Chris Churilo? using Milvus

The model does this by looking up the relevant information stored in Milvus, providing a more detailed response than before. Here is the final answer to our query.

Figure: DeepseekR1 answering query Who is Chris Churilo? using Milvus

The model now responds with a detailed answer that cites the uploaded document. In the screenshot above, you can see how the model pulls relevant details about Chris Churilo’s role, background, and contributions, information that wasn’t available before integrating Milvus. This enhanced response demonstrates the value of combining a local LLM deployment with a vector database to create a more informed and context-aware AI experience.

Verifying Collections in Milvus

Once you’ve embedded documents through AnythingLLM, you can confirm that Milvus is storing them correctly by listing your collections. If you haven’t already installed pymilvus, start by installing it with this command:


    pip install pymilvus

Enter fullscreen mode Exit fullscreen mode

Then run the following Python code to connect to Milvus, retrieve a list of collections, and print their names:


from pymilvus import MilvusClient

# Connect to Milvus

client = MilvusClient(uri="http://localhost:19530")  # Replace with your URI if it's different

# List all collections

collections = client.list_collections()

# Print results

print(f"Found {len(collections)} collections:")

for idx, collection in enumerate(collections, 1):

    print(f"{idx}. {collection}")

# Optional: Close the connection when done

client.close()

Enter fullscreen mode Exit fullscreen mode

If you’ve just installed Milvus, you won’t see many collections. However, after using AnythingLLM to embed documents, you’ll notice a new collection named after the workspace you used. In the example output below, the relevant collection is listed as 21. Anythingllm_milvus_deepseek_local_deploy.

This confirms that AnythingLLM has successfully created a dedicated collection in Milvus to store the embeddings for your uploaded data.

We’ve now walked through deploying DeepSeek-R1 locally, configuring AnythingLLM for a user-friendly interface, and integrating Milvus as a vector database. This setup allows the model to access custom data and answer questions more effectively.

Conclusion

Setting up DeepSeek-R1 locally with the help of Ollama, AnythingLLM, and Milvus opens up new possibilities for customizing and enhancing your AI workflows. This approach not only gives you full control over your environment but also allows your model to access specific data sources, improving its relevance and accuracy. With this setup, you're no longer limited by server availability or generic responses—your AI can now work directly with the information you provide, tailored to your needs.

Further Resources

Top comments (0)