Introduction
Whats up everyone? This is Dev. This blog is to understand the abstract of DeepSeek-R1’s Research Paper. In case you have not heard about DeepSeek, check out the following video to know more about it.
Brief about DeepSeek R1
DeepSeek is a Chinese AI company that develops open-source large language models. They recently release their first generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. The later model disrupted the internet and everyone is/was talking about it. The major reason behind it is that DeepSeek-R1 achieved the performance that can be comparable with OpenAI's o1 model on reasoning tasks.
Understand Abstract
The abstract starts with introducing the models. Further, it talks about how DeepSeek-R1-Zero was trained. It says that DeepSeek-R1-Zero was trained via Large Scale Reinforcement Learning without Supervised Fine-Tuning as a preliminary step.
Large Scale Reinforcement Learning
- Reinforcement Learning (RL) refers to the learning that a model does using trial and error method.
- Gradually the model improves, based on the feedback.
- When the model is made to do RL at a large scale, it can be called as 'Large Scale Reinforcement Learning'.
Supervised Fine-Tuning
- When the model is taught using examples and correct answers, it can be called as 'Supervised Fine-Tuning' (SFT).
In general while training an AI model, the usual steps include SFT, first and then training the model through Large Scale RL. However, as per the abstract, DeepSeek-R1-Zero was directly trained through 'Large Scale RL', skipping the step of SFT.
Moreover, the abstract claims that through RL, DeepSeek-R1-Zero emerged with numerous powerful and intriguing reasoning behaviors. Nonetheless, it encountered challenges such as poor readability and language mixing.
To solve this issue, they introduced DeepSeek-R1 that incorporated multi-stage training and cold-start data before RL.
Multi-Stage Training
- Instead of training the model, all at once, the model is trained in different steps or stages.
- With each step, the model gets better and prepares it for the next step.
Cold-Start Data
- 'Cold-Start' means before making the model learn through RL, the model is given some starting knowledge.
Following this methodology, DeepSeek-R1 demonstrated the performance that is comparable to OpenAI-o1-1217 model on reasoning tasks.
To add to this, to support the research community, DeepSeek-R1-Zero and DeepSeek-R1 is made open-sourced. Along with it, they released six dense models distilled from DeepSeek-R1 based on Qwen and Llama: 1.5B
, 7B
, 8B
, 14B
, 32B
and 70B
.
Distilled Model
- A Distilled Model is a smaller, faster and more efficient version of a larger AI Model.
In the above mentioned video, I ran a distilled model of DeepSeek-R1 using Ollama and prompted it a complex programming question.
Conclusion
Thank you for reading the blog. Here is the DeepSeek R1's Research Paper in case you want to check it out.
Moreover, I am also working on making a website which lists the terms related to AI models, that I came across while reading AI model's research papers, along with their description, as I understand them. If you are also interested in learning AI though books and research papers, this can be helpful.
Top comments (0)