Mohamad Al

Posted on Feb 3

Build your next AI Tech Startup with DeepSeek

#deepseek #ai #programming #tutorial

Over the past 2 weeks, a new contender in the AI Revolution has taken over seamingly from nowhere, DeepSeek, with it's V3 and R1 models, two LLMS that rival OpenAI. With R1 costing just $5.6 million and 6 weeks of work, was created by none other than China! It's not only just as good, if not better than GPT-4o and o1, but free and open source, which allows us developers for the first time ever to run actually powerful LLM's locally/offline. This blog post is a deep dive on what exactly is DeepSeek, why you should even care to begin with, how they were able to pull it off, and most importantly, how you can take advantage of it to build your next million dollar tech startup.

What is DeepSeek?

DeepSeek is a Chinese AI company specializing in building open-source large language models founded just 18 months ago in July of 2023. DeepSeek R1 isn't their first LLM, but it's their first reasoning model, comparable to OpenAI's o1 model.

Benchmarks of reasoning models. Source

Benchmarks of general-purpose models. Source

Why should I care?

It's just as good, if not better than OpenAI
It's completely free to use on their official website
DeepSeek's API is over 96% cheaper than OpenAI's (Calculated using R1's pricing input token cache miss, and comparing it to o1's pricing input tokens parameter)
It's 100% Free and Open Source, released under the MIT License (Source), allowing you to run it locally on your computer. (We will learn how in this post) You can check it out on GitHub.

How did they pull it off?

There are tons of different ways they did this. I've chosen some of the most important to highlight. Note that these are highly, highly over-simplified. If you want a more complex deep-dive into how these work, check out the sources.

1. Heavy Low-Level Optimization

Due to US Government restrictions on selling high-level chips to China, DeepSeek, a Chinese company, didn't have access to the most powerful NVIDIA cards (Ex. NVIDIA H100s) to train their models on, which meant they had to figure out how to optimize the chips that they already had (NVIDIA H800s). To summarize, they used Mixture-of-Experts (MoE) and Multi-head Latent Attention (MLA) technology in order to maximize GPU performance. (Source)

Basic Architecture of DeepSeek V3. Source

2. Only Train what's Necessary

Typically, training parts of an AI model usually meant updating the whole thing, even if some parts didn't contribute anything, which lead to a massive waste of resources. To solve this, they introduced an Auxiliary-Loss-Free (ALS) Load Balancing. The ALS Load Balancing works by introducing a bias factor to prevent overloading one chip, while under-utilizing another (Source). This resulted in only 5% of the model's parameters being trained per-token, and around 91% cheaper cost to train than GPT 4 (GPT 4 costed $63 million to train (Source) and V3 costed $5.576 million to train. (Source))

3. Compression

Under the hood, lots of key-value pairs are used. Storing all of them would allocate tons of resources. To fix this, DeepSeek uses Low-Rank Key-Value (KV) Joint Compression. It works by compressing them using a down-projection matrix. This compressed version is what's stored. When the data is needed, the technique is reversed to get back the original value, minimizing size, decreasing processing speed, and reducing memory usage. (Source)

Different compression methods, with the main one, MLA powered by LRKVJ Compression. Source

4. Reinforced Learning

Part of the way the model was trained is a lot like how you would train a dog.

The model was given complex, yet easy-to-validate questions to answer.
If it answers correctly it's "rewarded", causing it to reinforce those patterns
If it answers incorrectly, it adjusts itself in order to improve itself in future reiterations

Source

The Result:

Training Costs	Pre-Training	Context Extension	Post-Training	Total
In H800s GPU Hours	2664K	119K	5K	2788K
in USD	$5.328M	$0.23M	$0.01M	$5.576M

(Source)

How can I take advantage of it?

DeepSeek's models are pretty easy to take advantage of. Here's how you can use them:

1. Online

You can use DeepSeek V3 and R1 for free on their official website.

Warning: Any input or output done online will be stored and can be looked at by DeepSeek or the Chinese government

DeepSeek Chat Page

2. API

DeepSeek have an offical API in case you don't want to self-host the models yourself, which is over 96% cheaper than OpenAI (Calculated using R1's pricing input token cache miss, and comparing it to o1's pricing input tokens parameter)

How to Use

The API itself is pretty straightforward. You can use it with the OpenAI package on NPM or PIP, or make an HTTP Request. Note for this demo I will be using NodeJS. I will be working in an empty folder with an index.js file, and a package.json file.

WARNING: NEVER STORE API KEYS ON THE CLIENT-SIDE

Apply for an API Key
Download package

npm install openai

3.Make a request where you want to

import OpenAI from "openai";

const openai = new OpenAI({
    baseURL: "https://api.deepseek.com",
    apiKey: "<DeepSeek API Key>"
});

const completion = await openai.chat.completions.create({
    messages: [
        { 
            role: "system", 
            content: "You are a helpful assistant." 
        },
        {
            role: "user",
            content: "What is 5 + 7?"
        }
    ],
    model: "deepseek-chat"
});

console.log(completion.choices[0].message.content);

Run it

node index.js

Output:

To find the sum of 5 and 7, follow these steps:

Start with the first number: 
5

Add the second number to it:
5 + 7

Perform the addition:
5 + 7 = 12

Final Answer: 12

Pretty easy isn't it?

3. Locally

Moving onto the fun stuff now, ✨ self-hosting ✨. Now unfortunately, the full model is around 400 GB. Most people don't have that much storage dedicated towards one model, and using it for your startup would be extremely expensive. Luckily, there are distilled models; fine-tuned models, that are significantly smaller. The bigger the size, the smarter but slower. Let's first try running DeepSeek on our machine.

Download Ollama

Ollama Homepage
Choose the size you want to run locally.

Note: V3 only has one size

Note: as of Feb 2nd 2025

# of Parameters (in billions)	Size (in GB)	Model
1.5B	1.1GB	deepseek-r1:1.5b
7B	4.7GB	deepseek-r1:7b
8B	4.9GB	deepseek-r1:8b
14B	9GB	deepseek-r1:14b
32B	20GB	deepseek-r1:32b
70B	43GB	deepseek-r1:70b
671B	404GB	deepseek-r1:671b or deepseek-v3

On your machine

Open your terminal and run the command of the model chosen above. Once it has finished installing, you should see this:

Now you can enter any prompt and get an answer. It even works offline!

In a project

Ollama also has a package for NPM and PIP. Note that for this demo, I will be using an empty folder with a package.json and index.js file.

Download package

npm install ollama

Invoke

import ollama from "ollama";

const response = await ollama.chat({
    model: "<MODEL>",
    messages: [
        {
            role: "user",
            content: "List 5 foods from Italy. Explain their origins"
        }
    ]
});

console.log(response.message.content);

Run it

node index.js

Output:

<think>
Okay, so I need to list five Italian foods and explain their origins. Hmm, let's see... I'm not an expert on Italian cuisine, but I know a few basics. Maybe I can start by thinking about the most famous dishes and then research where they come from.

First, pasta comes to mind. I think pasta has been around for a long time, maybe even before Christ. I've heard that Marco Polo brought it back to Europe, but I'm not 
sure if he actually introduced it to Italy or if it was already there. I should probably look into that more. Also, pizza is a big one—everyone knows it's from Italy. 
But again, when exactly? I think the word "pizza" comes from "pie," which refers to a flatbread. Maybe bakers in Italy started making something like modern pizza centuries ago.

Lasagna is another dish I can think of. It's a layered pasta with fillings and béchamel sauce. I believe it has roots in Emilia-Romagna, but I'm not sure about the exact origins or when it was first made. Then there's risotto, which is a rice dish cooked with broth and other ingredients. I think it originated in Lombardy, maybe around the 17th century.

Gelato is Italian ice cream, right? I've heard that it has been around since the Middle Ages. Maybe Arab traders introduced sugar to Italy, and then gelato became popular as a sweet treat. So, sugar was key here, and the techniques for making gelato might have come from the Arab world.

Wait, am I missing any other major dishes? Oh, like tiramisu. That's a classic, but I'm not sure about its exact origin either. It seems to be a dessert from the Emilia-Romagna region as well. Maybe it was created in the 1800s or something. But since the user only asked for five foods, maybe that's enough.

Let me try to organize this:

1. Pasta: Originating before Christ, possibly brought to Italy by Marco Polo but already existing there.
2. Pizza: From ancient Rome, a flatbread that evolved into modern pizza in Campania.
3. Lasagna: From Emilia-Romagna, medieval dish with layers of pasta and fillings.
4. Risotto: Originating in Lombardy around the 17th century, using rice and broth.
5. Gelato: Middle Ages, influenced by Arab traders introducing sugar.

I think that's a good list. Each has its own history, but I should double-check some details to ensure accuracy. For example, pasta's history is a bit tricky because it was known before Marco Polo, but his travels popularized it in Europe. Also, gelato's origins might be tied to the invention of sugar, which came through trade with 
Arab countries.

I wonder if there are other foods I could have included instead. Maybe something like risotto is more specific, or perhaps arancini, which are stuffed pasta balls, but they're a variant of pasta too. Or maybe minestrone soup, but that's more of a category than a specific dish.

No, the five listed seem to cover a good range from different regions and time periods in Italy. I think that's sufficient for this task.
</think>

Here is a list of five Italian foods along with their origins:

1. **Pasta**: Originating well before Christ, pasta has ancient roots and was likely already present in Italy when Marco Polo returned from his travels, where he popularized its use in Europe.

2. **Pizza**: The word "pizza" refers to a flatbread that traces its origins back to ancient Rome. Modern pizza as we know it evolved in Campania, particularly in areas like Napoli.

3. **Lasagna**: Hailing from the Emilia-Romagna region, lasagna is a medieval dish characterized by its layered structure of pasta interleaved with fillings and covered in béchamel sauce.

4. **Risotto**: Originating in Lombardy during the 17th century, risotto is a rice dish cooked with broth and various ingredients, known for its creamy texture.       

5. **Gelato**: This Italian ice cream has medieval roots, influenced by Arab traders who introduced sugar to Italy. Gelato's techniques have been passed down through generations, becoming a beloved treat.

This selection highlights the diverse culinary history of Italy, spanning regions and centuries.

Woah. That looks really weird doesn't it? The reason why is pretty simple. The response is in a format called markdown. We have three options to tackle this.

1. Embrace it

Markdown is like a better version of plain text. In fact, this blog is being written in markdown to bold or italicize or ~~strikethrough~~ text. If you want markdown, then that's that!

2. Convert it to plain-text

We can use a library called remove-markdown in order to strip the markdown part of the text.

Download package

npm install remove-markdown

Update Code

import ollama from "ollama";
import removeMd from "remove-markdown";

const response = await ollama.chat({
    model: "<MODEL>",
    messages: [
        {
            role: "user",
            content: "List 5 foods from Italy. Explain their origins."
        }
    ]
});

console.log(removeMd(response.message.content));

3. Convert to HTML

If you are trying to render this in the browser, we can use the marked library in order to convert the markdown into HTML code.

Download Package

npm install marked

Update Code

import ollama from "ollama";
import { writeFileSync } from "fs";
import { parse } from "marked";

const response = await ollama.chat({
    model: "<MODEL>",
    messages: [
        {
            role: "user",
            content: "List 5 foods from Italy. Explain their origins."
        }
    ]
});

// Optionally I am saving the response to an HTML file so I can view it in my browser
writeFileSync("response.html", `
    <body>
    ${parse(response.message.content)}
    </body>
`);

Output:

response.html in Browser

Conclusion

DeepSeek is a very powerful new contender in the AI Industry. It has achieved revolutionary breakthroughs in the AI Industry that have never been done before. However with this, it has given us developers the opportunities to use AI in ways that were previously impossible. What will you build with it?

Top comments (1)

Naim Al • Feb 3

Great Article!

DEV Community

Build your next AI Tech Startup with DeepSeek

What is DeepSeek?

Why should I care?

How did they pull it off?

1. Heavy Low-Level Optimization

2. Only Train what's Necessary

3. Compression

4. Reinforced Learning

The Result:

How can I take advantage of it?

1. Online

Warning: Any input or output done online will be stored and can be looked at by DeepSeek or the Chinese government

2. API

How to Use

WARNING: NEVER STORE API KEYS ON THE CLIENT-SIDE

3. Locally

Note: V3 only has one size

Note: as of Feb 2nd 2025

On your machine

In a project

1. Embrace it

2. Convert it to plain-text

3. Convert to HTML

Conclusion

Top comments (1)

Read next

LocalSeek: A Privacy-First, Visually Stunning, Localized AI Chat for VS Code

Why GraphQL? A Developer-Friendly Guide to API Evolution

Integrating DeepSeek into n8n: Low-Cost AI Automations

Integrando o DeepSeek no n8n: Automações com IA de Baixo Custo