Overview
DeepSeek-R1 is a state-of-the-art large language model developed by the Chinese AI startup DeepSeek. With 671 billion parameters, it matches the performance of leading models like OpenAIโs GPT-4, excelling in tasks such as mathematics, coding, and complex reasoning.
The model was trained using 2,048 NVIDIA H800 GPUs over approximately two months, highlighting its substantial computational demands.
Given its size, deploying DeepSeek-R1 requires significant hardware resources. The table below outlines the hardware requirements for DeepSeek-R1 and its distilled variants:
Hardware Requirements for DeepSeek-R1
Model Variant | Parameters (B) | VRAM Requirement (GB) | Recommended GPU Configuration |
---|---|---|---|
DeepSeek-R1 | 671 | ~1,342 | Multi-GPU setup (e.g., NVIDIA A100 80GB ร16) |
DeepSeek-R1-Distill-Qwen-1.5B | 1.5 | ~0.7 | NVIDIA RTX 3060 12GB or higher |
DeepSeek-R1-Distill-Qwen-7B | 7 | ~3.3 | NVIDIA RTX 3070 8GB or higher |
DeepSeek-R1-Distill-Llama-8B | 8 | ~3.7 | NVIDIA RTX 3070 8GB or higher |
DeepSeek-R1-Distill-Qwen-14B | 14 | ~6.5 | NVIDIA RTX 3080 10GB or higher |
DeepSeek-R1-Distill-Qwen-32B | 32 | ~14.9 | NVIDIA RTX 4090 24GB |
DeepSeek-R1-Distill-Llama-70B | 70 | ~32.7 | NVIDIA RTX 4090 24GB ร2 |
Key Considerations
๐น VRAM Usage
- The VRAM requirements are approximate and can vary based on specific configurations and optimizations.
๐น Distributed GPU Setup
- Deploying the full DeepSeek-R1 671B model requires a multi-GPU setup, as a single GPU cannot handle its extensive VRAM needs.
๐น Distilled Models for Lower VRAM Usage
- Distilled variants provide optimized performance with reduced computational requirements, making them more suitable for single-GPU setups.
For developers and researchers without access to high-end hardware, these distilled versions offer a more accessible alternative, retaining significant reasoning capabilities while reducing resource consumption.
Conclusion
Deploying DeepSeek-R1 671B necessitates substantial computational power, particularly for the full-scale model. However, the availability of distilled variants provides flexibility, making it possible to run efficient versions on less powerful hardware configurations.
Top comments (1)
I could run 14b on my rtx 3070. It was okay, but not as fast as 8b. definitely recommend downloading both. sometimes its worth the time. Thank you for this info ๐. here is the 14b metrics if anyone is interested.
prompt:
tell me a story
metrics:
total duration: 1m10.9480334s
load duration: 20.8813ms
prompt eval count: 26 token(s)
prompt eval duration: 423ms
prompt eval rate: 61.47 tokens/s
eval count: 521 token(s)
eval duration: 1m10.173s
eval rate: 7.42 tokens/s