DEV Community

Cover image for Decoding DeepSeek R1's Research Abstract
Ďēv Šhãh 🥑
Ďēv Šhãh 🥑

Posted on

Decoding DeepSeek R1's Research Abstract

Introduction

Whats up everyone? This is Dev. This blog is to understand the abstract of DeepSeek-R1’s Research Paper. In case you have not heard about DeepSeek, check out the following video to know more about it.

Brief about DeepSeek R1

DeepSeek is a Chinese AI company that develops open-source large language models. They recently release their first generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. The later model disrupted the internet and everyone is/was talking about it. The major reason behind it is that DeepSeek-R1 achieved the performance that can be comparable with OpenAI's o1 model on reasoning tasks.

Understand Abstract

The abstract starts with introducing the models. Further, it talks about how DeepSeek-R1-Zero was trained. It says that DeepSeek-R1-Zero was trained via Large Scale Reinforcement Learning without Supervised Fine-Tuning as a preliminary step.

Large Scale Reinforcement Learning

  • Reinforcement Learning (RL) refers to the learning that a model does using trial and error method.
  • Gradually the model improves, based on the feedback.
  • When the model is made to do RL at a large scale, it can be called as 'Large Scale Reinforcement Learning'.

Supervised Fine-Tuning

  • When the model is taught using examples and correct answers, it can be called as 'Supervised Fine-Tuning' (SFT).

In general while training an AI model, the usual steps include SFT, first and then training the model through Large Scale RL. However, as per the abstract, DeepSeek-R1-Zero was directly trained through 'Large Scale RL', skipping the step of SFT.

Moreover, the abstract claims that through RL, DeepSeek-R1-Zero emerged with numerous powerful and intriguing reasoning behaviors. Nonetheless, it encountered challenges such as poor readability and language mixing.

To solve this issue, they introduced DeepSeek-R1 that incorporated multi-stage training and cold-start data before RL.

Multi-Stage Training

  • Instead of training the model, all at once, the model is trained in different steps or stages.
  • With each step, the model gets better and prepares it for the next step.

Cold-Start Data

  • 'Cold-Start' means before making the model learn through RL, the model is given some starting knowledge.

Following this methodology, DeepSeek-R1 demonstrated the performance that is comparable to OpenAI-o1-1217 model on reasoning tasks.

To add to this, to support the research community, DeepSeek-R1-Zero and DeepSeek-R1 is made open-sourced. Along with it, they released six dense models distilled from DeepSeek-R1 based on Qwen and Llama: 1.5B, 7B, 8B, 14B, 32B and 70B.

Distilled Model

  • A Distilled Model is a smaller, faster and more efficient version of a larger AI Model.

In the above mentioned video, I ran a distilled model of DeepSeek-R1 using Ollama and prompted it a complex programming question.

Conclusion

Thank you for reading the blog. Here is the DeepSeek R1's Research Paper in case you want to check it out.

Moreover, I am also working on making a website which lists the terms related to AI models, that I came across while reading AI model's research papers, along with their description, as I understand them. If you are also interested in learning AI though books and research papers, this can be helpful.

Top comments (0)