DEV Community

mehmet akar
mehmet akar

Posted on

OpenAI o3-mini(high): Pushing the Frontier of Cost-Effective Reasoning

OpenAI o3-mini have been finally released!

Let's look at their official announcment, deeply.

They are releasing OpenAI o3-mini, the newest and most cost-efficient model in their reasoning series, available in both ChatGPT and the API today.

o3mini(high), even though, it is mini version, it surpasses o1 in benchmarks. It is quite surprise and I wonder what o3 non-mini version can achieve in coming weeks.

Previewed in December 2024, this powerful and fast model advances the boundaries of what small models can achieve, delivering exceptional STEM capabilities—with particular strength in science, math, and coding—all while maintaining the low cost and reduced latency of OpenAI o1-mini.

Key Features of OpenAI o3-mini

OpenAI o3-mini is first small reasoning model that supports highly requested developer features, including:

Additionally, developers can choose between three reasoning effort levels—low, medium, and high—to optimize for their specific use cases. This flexibility allows o3-mini to “think harder” when tackling complex challenges or prioritize speed when latency is a concern.

Note: o3-mini does not support vision capabilities. Developers should continue using OpenAI o1 for visual reasoning tasks.

Availability

  • ChatGPT Plus, Team, and Pro users can access OpenAI o3-mini starting today.
  • Enterprise access will be available in a week.
  • API users in tiers 3-5 can start integrating o3-mini into their applications.
  • Free users can try OpenAI o3-mini by selecting ‘Reason’ in the message composer or regenerating a response.

As part of this upgrade, rate limits for Plus and Team users will triple from 50 messages per day with o1-mini to 150 messages per day with o3-mini.

Performance Improvements

Fast, Powerful, and Optimized for STEM Reasoning

Similar to its OpenAI o1 predecessor, OpenAI o3-mini has been optimized for STEM reasoning. With medium reasoning effort, it matches o1’s performance in math, coding, and science while delivering faster responses.

Performance Highlights:

  • 56% preference rate over o1-mini.
  • 39% reduction in major errors on difficult real-world questions.
  • Matches performance of o1 on AIME and GPQA evaluations with medium reasoning effort.

Competition Math (AIME 2024)

Mathematics: With low reasoning effort, OpenAI o3-mini achieves comparable performance with OpenAI o1-mini. With high reasoning effort, it outperforms both OpenAI o1-mini and OpenAI o1.

AIME 2024 Competition Math Performance

Source

PhD-Level Science Questions (GPQA Diamond)

On PhD-level biology, chemistry, and physics questions, OpenAI o3-mini with high reasoning effort achieves comparable performance with OpenAI o1.

GPQA Science Questions Performance

Source

FrontierMath

OpenAI o3-mini with high reasoning performs better than its predecessor on FrontierMath. It solves over 32% of problems on the first attempt, including more than 28% of challenging (T3) problems.

FrontierMath Performance

Source

Competition Coding (Codeforces)

On Codeforces competitive programming, OpenAI o3-mini achieves progressively higher Elo scores with increased reasoning effort. With medium reasoning effort, it matches OpenAI o1’s performance.

Codeforces Competition Coding

Source

Software Engineering (SWE-Bench Verified)

OpenAI o3-mini is highest performing released model on SWE-bench Verified.

  • 39% accuracy with the open-source Agentless scaffold.
  • 61% accuracy with an internal tools scaffold.

Software Engineering Performance

Source

LiveBench Coding

OpenAI o3-mini demonstrates strong coding performance in LiveBench Coding evaluations.

LiveBench Coding Performance

Source

OpenAI o3-mini Release: Last but not Least

OpenAI o3-mini marks a significant step forward in cost-efficient reasoning while maintaining exceptional performance in STEM disciplines, coding, and logical problem-solving. With faster response times, improved accuracy, and enhanced flexibility, it can be a good choice for developers and users looking for a high-performing small model.

Let's see the new benchmarks focusing on deepseek-qwen-o3-mini in coming days.

Top comments (2)

Collapse
 
programordie profile image
programORdie

For coding, I prefer DeepSeek R1 over o3-mini. Somehow, chatGPT always forgets his context mid-question.

Collapse
 
mehmetakar profile image
mehmet akar

Good observation! There is still a long way to go for llms in terms of coding.