Artur Ampilogov

Posted on Jan 31

Is DeepSeek’s Influence Overblown?

#ai #deepseek #nvidia #openai

At the beginning of this week, the tech stock market crashed after the appearance of a new AI model called "DeepSeek." Nvidia's stock (NVDA) fell by 17%.

According to the official paper, DeepSeek took only $5.6 mln to train with impressive results. This is a remarkable achievement for a large language model (LLM). In comparison, OpenAI's CEO Sam Altman admitted that training OpenAI GPT-4 took over $100 mln, not saying how much more.
Some AI specialists assume that the estimation of the DeepSeek training expense is underreported. Nevertheless, the hidden gem is not how much it cost to train but how drastically it improved runtime requirements.

Modern LLMs present a complex structure, but the number of parameters is an essential metric for all deep-learning models.

What are AI model parameters?

Consider the video of a snow leopard wearing a red coat on a fashion show I generated using Pika 2.1

How does AI know what "red" is?

Imagine creating a small neural network to say whether a given pixel is red. For programs, it is usual to present an arbitrary color with three mixed values: red, green, and blue, also known as RGB, where each value ranges from 0 to 255. For example, (0,0,0) is black, (0,204,102) is green, (204, 0, 0) is red.

For output, we will use the formula:

Input 1 (red) * Weight 1 + 
Input 2 (green) * Weight 2 + 
Input 3 (blue) * Weight 3 = Output

The weights are unknown. The output should be 0 to 100, where 100 means it is definitely red, 80 is likely red, and 0 is not red at all. During training, the neural network will be passed a million examples with the expected result. The model should figure out the constant weight values that return successful results for at least 95% of the training data.

Then, the output neuron can be connected further with the following neurons, becoming an input with the pixel "redness" notion.

Imagine having millions of input values for such a network, for example, representing text characters in a big PDF file, pixels of a large image, or a user prompt. A trained neural network with numerous connections between neurons and the determined weights, representing the strength of a signal, can calculate various results: understand the prompt requirements, generate a photo or text, derive a logical response, or modify a video.

The actual formula is slightly more sophisticated. In addition to weights, bias variables are added to tilt the output, and the output is finally normalized to a fraction between 0 and 1.

The derived constant weights and bias values are the model parameters. They represent actual AI "brain" patterns.

How big are modern LLMs?

Open AI GPT-3, created in 2020, has 175 billion parameters. GPT-4 was never disclosed but is estimated to have 1.8 trillion parameters, with multiple models inside, about 220 billion each.
Google PaLM 2, released in 2023, allegedly has 340 billion parameters, while the modern Google Flex models might be much more significant.
Meta Llama 3.2 is an open-sourced model with 90 billion parameters, but its results are not as good as those of contemporary Open AI and Google models.

Most LLMs store each parameter as a floating-point number of 2 or 4 Bytes. The full GPT-4 model with an estimated 1.8 trillion parameters requires at least 3.6TB RAM to be able to run! It is the minimum theoretical requirement; additional memory is also needed for input parameters, output storage at each neuron, and some technical space.

Just think of it. The model requires a cluster of expensive GPUs with lots of RAM not only to train it but also just to produce a response at runtime.

Calculating massive amounts of neuron outcomes is very slow, often with many steps running repeatedly over and over. GPT-4 output speed is about 30 words per second, and the GPT-o1 reasoning model is even slower. So, when you send a request to ChatGPT, it uses a massive GPU cluster behind the scenes to process only one input. Imagine how many clusters Open AI, Google, Microsoft, xAI, Anthropic, and other AI leaders have to own or lease to process requests globally.

What is about DeepSeek?

DeepSeek made three noteworthy transformations in the AI world:

A parameter of 1 Byte reduces the main memory part requirement by half while still producing exceptional results.
The model has 671 billion parameters, but only 37 billion, 1/18th of the whole network, are used during processing. The network decides which path and neuron groups to take, resulting in faster responses. Think of a real-life analog: whether a doctor, pilot, or physical scientist is better to answer the question, and not all of them at once.
Being fully open-sourced.

With 671 billion parameters, DeepSeek still requires a lot of memory, more than 700 GB of RAM, to run, but it is the first open-source model to produce outcomes almost as good as the most expensive, sophisticated, and private LLMs. Some people started experimenting with local DeepSeek hosting, for example, by grouping Mac Minis into a home cluster.

Conclusion

Hearing about the claimed lower model training cost, did people correctly produce the selling panic in the stock market, thinking there would be no need for so many GPU chips and clusters? Not at all.

First, now that enthusiasts and businesses know running a local and private ChatGPT analog is possible, they will likely order new and small Nvidia clusters created specifically for these needs.

Second, more sophisticated patterns require more model parameters, which, as shown above, demand enormous computer resources. A human brain has about 100 trillion connections between neurons, while the most significant AI models now operate with only a couple of trillion. Future models will require much more RAM to operate. Also, individuals and businesses are increasingly embracing AI benefits.

It is not going to change, and the AI chip market is only going to grow.

DEV Community

Is DeepSeek’s Influence Overblown?

What are AI model parameters?

How does AI know what "red" is?

How big are modern LLMs?

What is about DeepSeek?

Conclusion

Top comments (0)

Read next

Claude 3.7 Released

The Python Toolbox: Essential Tools Every Developer Should Know

Claude 3.7 vs Claude 3.7 Thinking

What Are Agentic AI Tools?