Douglas Toledo

Posted on Feb 7

DeepSeek R1 vs o3-mini for Developers: Which is the Best?

#ai #deepseek #o3mini #programming

Hello, everyone!

Recently, OpenAI launched its new model, the o3-mini. With so many options emerging, the big question for every developer is: Which model should I use?

To answer this question, I spent the last few hours testing the o3-mini and the DeepSeek R1 on common tasks that we developers perform daily. These tasks are:

Building a program from scratch;
Adding a feature to existing code;
Refactoring code and generating tests.

In this article, I will share my recommendations and insights. My goal is for all of us to become better developers by leveraging AI to our advantage.

Performance, Price, and Context Window

Before diving into practical tests, it is essential to understand the specifications of each model, as they are crucial in determining which one aligns best with your project's needs.

1. Performance

o3-mini and DeepSeek R1 lead in the SWE Bench (a test that evaluates the ability to solve GitHub issues), with scores above 49.
Claude 3.5 Sonnet initially showed good scores, but as the tests below revealed, it demonstrated significant limitations in executing complex tasks.

2. Cost per Million Tokens

DeepSeek R1: input: $0.55 and output: $2.19 (more economical),
o3-mini: input: $1.10 and output: $4.40.
Claude 3.5 Sonnet: input: $3.00 and output: $15.00.

3. Context Window

o3-mini and Claude 3.5: Up to 200k tokens (better for larger and more complex requests).
DeepSeek R1: Up to 128k tokens.

Practical Test 1: Building a Project from Scratch

Task: Create an interface to chat with local LLMs via Ollama, with chat functionalities, conversation history, and model selection.

Results:

Model	Files Generated	Functional Features	Observations
o3-mini using Cursor	3 (HTML, CSS, and JS separated)	All	Code organized, but UI and styling very basic
DeepSeek R1 on the Web	1 (HTML, CSS, and JS condensed)	Chat and Model Selection	No Conversation History, UI and styling were better
DeepSeek R1 using Cursor	0	-	Failed to create multiple files, many manual adjustments
Claude 3.5 using Cursor	0	-	Completely failed

Winner: o3-mini, for its consistency and ability to generate complex projects in a single request.

Practical Test 2: Adding a Feature to Existing Code

Task: Integrate a user interface (UI) into an existing CLI to interact with AI agents.

Results:

o3-mini using Cursor:
- Generated new files and added the feature after more than 20 iterations.
- Had greater difficulty understanding UI state management, requiring prompt adjustments and manual fixes after the generated result.
DeepSeek R1 using Cursor:
- Generated new files and added the feature in just 9 iterations, with cleaner and more organized code than o3-mini.
- Needed guidance to adjust some integrations, but was faster than o3-mini in understanding the requirements.

Winner: DeepSeek R1, although o3-mini is more "autonomous," it struggled significantly in understanding key functionalities for integration. In contrast, while DeepSeek R1 required more "supervision," it better understood the needs and delivered the new feature quickly.

Practical Test 3: Refactoring Code and Generating Tests

Task: Refactor functions in a React/TypeScript web application and add unit tests.

Results:

o3-mini using Cursor:
- Refactored the code, followed best practices, and generated functional tests (with minor adjustments needed).
DeepSeek R1 using Cursor:
- Introduced critical bugs by removing essential functions.
- Generated valid tests but failed in refactoring.

Winner: o3-mini, for its precision and lower risk of breaking existing code.

Final Recommendations

For New Projects: Use o3-mini in Cursor. Its ability to generate structured code in a single pass is unmatched.
For Complex Features: Combine o3-mini (for architecture) with DeepSeek R1 (for specific snippets).
For Tight Budgets: DeepSeek R1 is the most economical choice but requires more attention and supervision during development.

What About Claude 3.5?

With a cost 7x higher and inferior performance already in the first practical test, Claude 3.5 is not a viable option for daily development. I recommend focusing on o3-mini and DeepSeek R1, which offer a better balance between cost and performance.

How to Use Both Models Together

Planning Phase: Use o3-mini to outline the overall project structure. Its ability to handle large context windows allows for comprehensive planning.
Optimization and Final Adjustments: After structuring the project, use DeepSeek R1 with continuous "supervision" to fine-tune specific functions, improve code efficiency, and reduce costs in specific tasks.

Final Considerations

The integration of AI models like o3-mini and DeepSeek R1 into the development workflow can completely transform the way we create and maintain projects.

While o3-mini stands out for its consistency and ability to handle complex tasks, DeepSeek R1 offers an economical solution for fine-tuning and specific tasks.

So, which model will you test first? 👨‍💻

Did you like it? Share your experiences in the comments! 🚀

Top comments (1)

Rajath • Feb 11

deepseek R1 is so slow so you recommend their API ?

DEV Community

DeepSeek R1 vs o3-mini for Developers: Which is the Best?

Performance, Price, and Context Window

1. Performance

2. Cost per Million Tokens

3. Context Window

Practical Test 1: Building a Project from Scratch

Results:

Practical Test 2: Adding a Feature to Existing Code

Results:

Practical Test 3: Refactoring Code and Generating Tests

Results:

Final Recommendations

What About Claude 3.5?

How to Use Both Models Together

Final Considerations

Top comments (1)

Read next

Minimal Required Python for Data Analysis

🎙️We have implemented new features in HMPL to help developers make web apps smaller in size🔥

Gemika’s Enchanted Guide to Iris Dataset with Magic and Machine Learning 🌟🧙‍♂️ (Part #2)

AI Jobs Without a Computer Science Degree: How to Start