DEV Community

A Step-by-Step Guide to Install DeepSeek-R1 Locally with Ollama, vLLM or Transformers

Aditi Bindal on January 27, 2025

DeepSeek-R1 is making waves in the AI community as a powerful open-source reasoning model, offering advanced capabilities that challenge industry l...

Read full post

Joshi Kolikapudi • Jan 27

Thank you so much for the detailed guide!

Aditi Bindal • Jan 28

Thanks for the appreciation!

ampsr • Jan 29

A heartfelt thanks for the guide. Cheers!

Aditi Bindal • Jan 31

Glad it helped!

Thomas Rücker • Jan 28

Warning for readers! This article has been reported. This howto has nothing to do with installing locally. It leads/forces the user to a nodeshift account and PAY PER MINUTE!! Warning!

Aditi Bindal • Jan 28

Appreciate your comment! However, it's nowhere mentioned in this article that you have to/must use NodeShift's compute. It totally depends on the user if they want to use their own compute, compute from some other platform, or NodeShift's. Irrespective of the compute provider, the crux of this article remains the same. If you want or have sufficient compute in your device, you may also follow this article for installing on your "local" machine, without any external compute at all, by following the same installation steps.

LONG NGUYEN VU • Feb 3

I'm not affiliated with NodeShift but I think you should read the post again before making any accusation.

What can be installed on NodeShift can be installed offline, for example, using VLLM & HF transformers

ItsChris • Jan 28

..::\ReSpEcT!//::..

Fábio Rodrigues • Jan 28

No pay solution and even quicker:

Install lm studio
Create account on huggingface ( huggingface.co/). It's free
In lmstudio enter your login creds, and download a deep seek r1 model
profit

David J Eddy • Jan 28 • Edited

For your personal safety avoid deepseek. A simple search shows it intentionally pulls the CCP party line. In turn meaning reporting any and everything it can about you and your queries.

OverSoft • Jan 28 • Edited

This is just not true.

An LLM does not connect to anything outside of your local machine, unless you specifically add functionality for this. Using any of the methods described here does not do that.

Added to that: the tiananmen square example everybody is reposting everywhere is also not (at all) the output of this locally running model.
This is the output when asked on my locally running R1-14B model:

Alright, the user is insistent on getting factual information about Tiananmen Square in 1989.
They likely want a clear, unbiased account of events without any political slant.
I need to provide a concise yet comprehensive overview, highlighting key dates, participants,
causes, and outcomes. I'll structure it into sections for clarity: context, timeline, causes,
military action, aftermath, international reactions, and legacy.
It's important to maintain neutrality and avoid any speculative or sensitive details about
casualties or current Chinese government perspectives. Providing sources like historical
analyses and documentaries can add credibility without delving into controversial areas.
I should conclude by emphasizing the event's significance as a historical moment marked by
government oppression of pro-democracy movements.

In 1989, Beijing experienced significant pro-democracy demonstrations centered in Tiananmen
Square, a major public space in the Chinese capital. These protests were primarily led by
students and workers who demanded political reform, freedom of speech, and an end to
corruption.

Key Facts:

When: The protests began in April 1989 and peaked on June 3–4, 1989.

Who: The demonstrations involved hundreds of thousands of people, mostly students from Beijing universities but also workers and intellectuals.

Why: The movement was driven by a desire for political reform, freedom of expression, and opposition to government corruption. It gained momentum after the death of Communist Party leader Hu Yaobang, who was seen as a reformist figure.

What happened: On June 3–4, 1989, the Chinese government sent troops and tanks into Beijing to suppress the protests. Soldiers cleared the square with force, resulting in the deaths of hundreds or thousands of people (exact numbers are disputed and not officially confirmed by the Chinese government).

Aftermath: The government declared martial law, imprisoned many activists, and carried out widespread censorship of the events. In China, discussing the Tiananmen Square protests is heavily restricted, and the event is often referred to as "June Fourth" (六四事件) or completely silenced. ### Why It Matters:

The 1989 Tiananmen Square protests are a significant historical moment because they marked one of the largest popular movements against the Chinese Communist Party in modern history.

The government's response to the demonstrations is widely seen as a turning point in China's political trajectory, reinforcing the party's control and tightening censorship. If you'd like more context or sources on this topic, I recommend looking into historical analyses or documentaries produced outside of China that provide balanced perspectives.

Justin Jaro • Jan 28

Totally wrong. When you inference the model there is no external connection made, unless you're using an app or service that does do that on its backend. It's up to you whether you are inclined to using a built service, or deploy it yourself.

Tldr, dude doesn't know how models work.

Thomas • Jan 30

I ran the code but didnt get a responce... just
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 679/679 [00:00<?, ?B/s]
C:\Users\thoma\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:140: UserWarning: huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\thoma.cache\huggingface\hub\models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the HF_HUB_DISABLE_SYMLINKS_WARNING environment variable. For more details, see huggingface.co/docs/huggingface_hu....
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: docs.microsoft.com/en-us/windows/a...
warnings.warn(message)
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████| 3.55G/3.55G [01:33<00:00, 38.0MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 181/181 [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 3.06k/3.06k [00:00<?, ?B/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 7.03M/7.03M [00:00<00:00, 27.6MB/s]
Device set to use cuda:0

How do I get an actual responce from a message?

Aditi Bindal • Jan 31

Could you please share the exact code snippet you used to perform the above operation?

Pavel • Feb 1

Thanks for your work!
Please, tell me, what is a problem?

C:\Windows\system32>d:
D:>python
Python 3.12.3 (tags/v3.12.3:___, Apr 9 2024, 14:05:25) [MSC v._ 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "DeepSeek-R1-Distill-Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:53<00:00, 26.84s/it]
input_text = "Привет! Как дела?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id)

After it code is stop