Over the past 2 weeks, a new contender in the AI Revolution has taken over seamingly from nowhere, DeepSeek, with it's V3 and R1 models, two LLMS that rival OpenAI. With R1 costing just $5.6 million and 6 weeks of work, was created by none other than China! It's not only just as good, if not better than GPT-4o and o1, but free and open source, which allows us developers for the first time ever to run actually powerful LLM's locally/offline. This blog post is a deep dive on what exactly is DeepSeek, why you should even care to begin with, how they were able to pull it off, and most importantly, how you can take advantage of it to build your next million dollar tech startup.
What is DeepSeek?
DeepSeek is a Chinese AI company specializing in building open-source large language models founded just 18 months ago in July of 2023. DeepSeek R1 isn't their first LLM, but it's their first reasoning model, comparable to OpenAI's o1 model.
Why should I care?
- It's just as good, if not better than OpenAI
- It's completely free to use on their official website
- DeepSeek's API is over 96% cheaper than OpenAI's (Calculated using R1's pricing input token cache miss, and comparing it to o1's pricing input tokens parameter)
- It's 100% Free and Open Source, released under the MIT License (Source), allowing you to run it locally on your computer. (We will learn how in this post) You can check it out on GitHub.
How did they pull it off?
There are tons of different ways they did this. I've chosen some of the most important to highlight. Note that these are highly, highly over-simplified. If you want a more complex deep-dive into how these work, check out the sources.
1. Heavy Low-Level Optimization
Due to US Government restrictions on selling high-level chips to China, DeepSeek, a Chinese company, didn't have access to the most powerful NVIDIA cards (Ex. NVIDIA H100s) to train their models on, which meant they had to figure out how to optimize the chips that they already had (NVIDIA H800s). To summarize, they used Mixture-of-Experts (MoE) and Multi-head Latent Attention (MLA) technology in order to maximize GPU performance. (Source)
2. Only Train what's Necessary
Typically, training parts of an AI model usually meant updating the whole thing, even if some parts didn't contribute anything, which lead to a massive waste of resources. To solve this, they introduced an Auxiliary-Loss-Free (ALS) Load Balancing. The ALS Load Balancing works by introducing a bias factor to prevent overloading one chip, while under-utilizing another (Source). This resulted in only 5% of the model's parameters being trained per-token, and around 91% cheaper cost to train than GPT 4 (GPT 4 costed $63 million to train (Source) and V3 costed $5.576 million to train. (Source))
3. Compression
Under the hood, lots of key-value pairs are used. Storing all of them would allocate tons of resources. To fix this, DeepSeek uses Low-Rank Key-Value (KV) Joint Compression. It works by compressing them using a down-projection matrix. This compressed version is what's stored. When the data is needed, the technique is reversed to get back the original value, minimizing size, decreasing processing speed, and reducing memory usage. (Source)
4. Reinforced Learning
Part of the way the model was trained is a lot like how you would train a dog.
- The model was given complex, yet easy-to-validate questions to answer.
- If it answers correctly it's "rewarded", causing it to reinforce those patterns
- If it answers incorrectly, it adjusts itself in order to improve itself in future reiterations
The Result:
Training Costs | Pre-Training | Context Extension | Post-Training | Total |
---|---|---|---|---|
In H800s GPU Hours | 2664K | 119K | 5K | 2788K |
in USD | $5.328M | $0.23M | $0.01M | $5.576M |
(Source)
How can I take advantage of it?
DeepSeek's models are pretty easy to take advantage of. Here's how you can use them:
1. Online
You can use DeepSeek V3 and R1 for free on their official website.
Warning: Any input or output done online will be stored and can be looked at by DeepSeek or the Chinese government
2. API
DeepSeek have an offical API in case you don't want to self-host the models yourself, which is over 96% cheaper than OpenAI (Calculated using R1's pricing input token cache miss, and comparing it to o1's pricing input tokens parameter)
How to Use
The API itself is pretty straightforward. You can use it with the OpenAI package on NPM or PIP, or make an HTTP Request. Note for this demo I will be using NodeJS. I will be working in an empty folder with an index.js file, and a package.json file.
WARNING: NEVER STORE API KEYS ON THE CLIENT-SIDE
- Apply for an API Key
- Download package
npm install openai
3.Make a request where you want to
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://api.deepseek.com",
apiKey: "<DeepSeek API Key>"
});
const completion = await openai.chat.completions.create({
messages: [
{
role: "system",
content: "You are a helpful assistant."
},
{
role: "user",
content: "What is 5 + 7?"
}
],
model: "deepseek-chat"
});
console.log(completion.choices[0].message.content);
- Run it
node index.js
Output:
To find the sum of 5 and 7, follow these steps:
Start with the first number:
5
Add the second number to it:
5 + 7
Perform the addition:
5 + 7 = 12
Final Answer: 12
Pretty easy isn't it?
3. Locally
Moving onto the fun stuff now, ✨ self-hosting ✨. Now unfortunately, the full model is around 400 GB. Most people don't have that much storage dedicated towards one model, and using it for your startup would be extremely expensive. Luckily, there are distilled models; fine-tuned models, that are significantly smaller. The bigger the size, the smarter but slower. Let's first try running DeepSeek on our machine.
-
Ollama Homepage Choose the size you want to run locally.
Note: V3 only has one size
Note: as of Feb 2nd 2025
# of Parameters (in billions) | Size (in GB) | Model |
---|---|---|
1.5B | 1.1GB | deepseek-r1:1.5b |
7B | 4.7GB | deepseek-r1:7b |
8B | 4.9GB | deepseek-r1:8b |
14B | 9GB | deepseek-r1:14b |
32B | 20GB | deepseek-r1:32b |
70B | 43GB | deepseek-r1:70b |
671B | 404GB | deepseek-r1:671b or deepseek-v3 |
On your machine
Open your terminal and run the command of the model chosen above. Once it has finished installing, you should see this:
Now you can enter any prompt and get an answer. It even works offline!
In a project
Ollama also has a package for NPM and PIP. Note that for this demo, I will be using an empty folder with a package.json and index.js file.
- Download package
npm install ollama
- Invoke
import ollama from "ollama";
const response = await ollama.chat({
model: "<MODEL>",
messages: [
{
role: "user",
content: "List 5 foods from Italy. Explain their origins"
}
]
});
console.log(response.message.content);
- Run it
node index.js
Output:
<think>
Okay, so I need to list five Italian foods and explain their origins. Hmm, let's see... I'm not an expert on Italian cuisine, but I know a few basics. Maybe I can start by thinking about the most famous dishes and then research where they come from.
First, pasta comes to mind. I think pasta has been around for a long time, maybe even before Christ. I've heard that Marco Polo brought it back to Europe, but I'm not
sure if he actually introduced it to Italy or if it was already there. I should probably look into that more. Also, pizza is a big one—everyone knows it's from Italy.
But again, when exactly? I think the word "pizza" comes from "pie," which refers to a flatbread. Maybe bakers in Italy started making something like modern pizza centuries ago.
Lasagna is another dish I can think of. It's a layered pasta with fillings and béchamel sauce. I believe it has roots in Emilia-Romagna, but I'm not sure about the exact origins or when it was first made. Then there's risotto, which is a rice dish cooked with broth and other ingredients. I think it originated in Lombardy, maybe around the 17th century.
Gelato is Italian ice cream, right? I've heard that it has been around since the Middle Ages. Maybe Arab traders introduced sugar to Italy, and then gelato became popular as a sweet treat. So, sugar was key here, and the techniques for making gelato might have come from the Arab world.
Wait, am I missing any other major dishes? Oh, like tiramisu. That's a classic, but I'm not sure about its exact origin either. It seems to be a dessert from the Emilia-Romagna region as well. Maybe it was created in the 1800s or something. But since the user only asked for five foods, maybe that's enough.
Let me try to organize this:
1. Pasta: Originating before Christ, possibly brought to Italy by Marco Polo but already existing there.
2. Pizza: From ancient Rome, a flatbread that evolved into modern pizza in Campania.
3. Lasagna: From Emilia-Romagna, medieval dish with layers of pasta and fillings.
4. Risotto: Originating in Lombardy around the 17th century, using rice and broth.
5. Gelato: Middle Ages, influenced by Arab traders introducing sugar.
I think that's a good list. Each has its own history, but I should double-check some details to ensure accuracy. For example, pasta's history is a bit tricky because it was known before Marco Polo, but his travels popularized it in Europe. Also, gelato's origins might be tied to the invention of sugar, which came through trade with
Arab countries.
I wonder if there are other foods I could have included instead. Maybe something like risotto is more specific, or perhaps arancini, which are stuffed pasta balls, but they're a variant of pasta too. Or maybe minestrone soup, but that's more of a category than a specific dish.
No, the five listed seem to cover a good range from different regions and time periods in Italy. I think that's sufficient for this task.
</think>
Here is a list of five Italian foods along with their origins:
1. **Pasta**: Originating well before Christ, pasta has ancient roots and was likely already present in Italy when Marco Polo returned from his travels, where he popularized its use in Europe.
2. **Pizza**: The word "pizza" refers to a flatbread that traces its origins back to ancient Rome. Modern pizza as we know it evolved in Campania, particularly in areas like Napoli.
3. **Lasagna**: Hailing from the Emilia-Romagna region, lasagna is a medieval dish characterized by its layered structure of pasta interleaved with fillings and covered in béchamel sauce.
4. **Risotto**: Originating in Lombardy during the 17th century, risotto is a rice dish cooked with broth and various ingredients, known for its creamy texture.
5. **Gelato**: This Italian ice cream has medieval roots, influenced by Arab traders who introduced sugar to Italy. Gelato's techniques have been passed down through generations, becoming a beloved treat.
This selection highlights the diverse culinary history of Italy, spanning regions and centuries.
Woah. That looks really weird doesn't it? The reason why is pretty simple. The response is in a format called markdown. We have three options to tackle this.
1. Embrace it
Markdown is like a better version of plain text. In fact, this blog is being written in markdown to bold or italicize or strikethrough text. If you want markdown, then that's that!
2. Convert it to plain-text
We can use a library called remove-markdown in order to strip the markdown part of the text.
- Download package
npm install remove-markdown
- Update Code
import ollama from "ollama";
import removeMd from "remove-markdown";
const response = await ollama.chat({
model: "<MODEL>",
messages: [
{
role: "user",
content: "List 5 foods from Italy. Explain their origins."
}
]
});
console.log(removeMd(response.message.content));
3. Convert to HTML
If you are trying to render this in the browser, we can use the marked library in order to convert the markdown into HTML code.
- Download Package
npm install marked
- Update Code
import ollama from "ollama";
import { writeFileSync } from "fs";
import { parse } from "marked";
const response = await ollama.chat({
model: "<MODEL>",
messages: [
{
role: "user",
content: "List 5 foods from Italy. Explain their origins."
}
]
});
// Optionally I am saving the response to an HTML file so I can view it in my browser
writeFileSync("response.html", `
<body>
${parse(response.message.content)}
</body>
`);
Conclusion
DeepSeek is a very powerful new contender in the AI Industry. It has achieved revolutionary breakthroughs in the AI Industry that have never been done before. However with this, it has given us developers the opportunities to use AI in ways that were previously impossible. What will you build with it?
Top comments (1)
Great Article!