HarshVardhan Jain

Posted on Feb 3

Benchmarking ChatGPT, Qwen, and DeepSeek on Real-World AI Tasks

#ai #machinelearning #chatgpt #deepseek

..................................
This article was originally published here: https://decodebuzzing.medium.com/qbenchmarking-chatgpt-qwen-and-deepseek-on-real-world-ai-tasks-75b4d7040742
...................................

The wealthy tech giants in the U.S. once dominated the AI market but DeepSeek’s release caused waves in the industry, sparking massive hype. However, as if that wasn’t enough, Qwen 2.5 emerged — surpassing DeepSeek in multiple areas. Like other reasoning models such as DeepSeek-R1 and OpenAI’s O1, Qwen 2.5-Max operates in a way that conceals its thinking process, making it harder to trace its decision-making logic

This article puts ChatGPT, Qwen, and DeepSeek through their paces with a series of key challenges ranging from solving calculus problems to debugging code. Whether you’re a developer hunting for the perfect AI coding assistant, a researcher tackling quantum mechanics, or a business professional, today I will try to reveal which model is the smartest choice for your needs (and budget)

Comparative Analysis of AI Model Capabilities:-

1. Chatgpt

ChatGPT, developed by OpenAI still remains a dominant force in the AI space, built on the powerful GPT-5 architecture and fine-tuned using Reinforcement Learning from Human Feedback (RLHF). It’s a reliable go-to for a range of tasks, from creative writing to technical documentation, making it a top choice for content creators, educators, and startups However, it’s not perfect. When it comes to specialized fields, like advanced mathematics or niche legal domains, it can struggle. On top of that, its high infrastructure costs make it tough for smaller businesses or individual developers to access it easily

ChatGPT, built with a hefty $3B+ investment, is a massive infrastructure model, while DeepSeek achieved similar performance with just $5.6M — classic China moment huh?

2. Deepseek

Out of nowhere, DeepSeek emerged as a dark horse in the AI race challenging established giants with its focus on computational precision and efficiency.

Unlike its competitors, it’s tailored for scientific and mathematical tasks and is trained on top datasets like arXiv and Wolfram Alpha, which helps it perform well in areas like optimization, physics simulations, and complex math problems. DeepSeek’s real strength is how cheap it is ( no china pun intended 😤). While models like ChatGPT and Qwen require massive resources, Deepseek does the job with way less cost. So yeah you don't need to get $1000 for a ChatGPT subscription

DeepSeek’s way of responding feels a bit robotic, and it doesn’t adapt well to tasks that aren’t too technical. It’s great for specific fields like math and research but not for casual or creative conversation

3. Qwen

After Deepseek who would’ve thought another Chinese AI would pop up and start taking over? Classic China move — spread something and this time it’s AI lol

Qwen is dominating the business game with its multilingual setup, excelling in places like Asia, especially with Mandarin and Arabic. It’s the go-to for legal and financial tasks, and it is not a reasoning model like DeepSeek R1, meaning you can’t see its thinking process. But just like DeepSeek, it’s got that robotic vibe, making it less fun for casual or creative work. If you want something more flexible, Qwen might not be the best hang

You can think of it like a team of specialists: if you ask a complex question about physics, only the experts in physics respond, while the rest of the team stays inactive

Testing Time: Comparing the 3 AI’s with Real-World Issues

To ensure fairness and through evaluation, let’s throw some of the most hyped challenges like tough math problems, wild physics stuff, coding tasks, and tricky real-world questions

— — — — — — — — — — — —

1. Physics: The Rotating Ball Problem

To kick things off, let’s dive into the classic “rotating ball in a box” problem, which has become a popular benchmark for testing how well different AI models handle complex task

Challenge: Simulate a ball bouncing inside a rotating box while obeying the laws of physics

Picture a 2d shape rotating in space. Inside, a ball bounces off the walls, staying within the boundaries and no external force. At first glance, it might seem simple, but accounting for gravity, constant rotation, and precise collision dynamics makes it a challenging simulation. You’d be surprised at how differently AI models tackle it

Prompt:-

Write a Python script that simulates a yellow ball bouncing inside a rotating square.
The ball should bounce realistically off the square’s edges, with the square rotating slowly over time
The ball must stay within the square's boundaries as the box rotates.

Box Rotation: The box should rotate continuously.
Ball Physics: The ball reacts to gravity and bounces off the box’s walls.
Ball Inside Boundaries: Make sure the ball doesn’t escape the box's boundaries, even as the box rotates.
Realistic Physics: Include proper collision detection and smooth animation
Use Python 3.x with Pygame or any similar library for rendering

Results:

1. ChatGPT’s Output: Fast but Flawed

With Chatgpt I had high expectations. But the results? Let’s just say they were… underwhelming. While DeepSeek took its time for accuracy, ChatGPT instantly spat out a clean-looking script. The ball didn’t bounce realistically. Instead, it glitched around the edges of the box, sometimes getting stuck in the corners or phasing through the walls. It is clear that ChatGPT prefers speed over depth, delivers a solution that works ****— but only in the most basic sense

Chatgpt’s Code:

import pygame
import math

# Initialize Pygame
pygame.init()

# Constants
WIDTH, HEIGHT = 600, 600
FPS = 60
GRAVITY = 0.2
BOUNCE_DAMPING = 0.8
ROTATION_SPEED = 0.5  # Degrees per frame
BALL_RADIUS = 10

# Colors
YELLOW = (255, 255, 0)
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)

# Screen setup
screen = pygame.display.set_mode((WIDTH, HEIGHT))
clock = pygame.time.Clock()

# Square properties
square_center = (WIDTH // 2, HEIGHT // 2)
square_size = 300
angle = 0

# Ball properties
ball_pos = [WIDTH // 2, HEIGHT // 2 - 50]
ball_velocity = [2, 0]

# Function to rotate a point around a center
def rotate_point(point, center, angle):
    x, y = point
    cx, cy = center
    radians = math.radians(angle)
    cos_a, sin_a = math.cos(radians), math.sin(radians)
    new_x = cos_a * (x - cx) - sin_a * (y - cy) + cx
    new_y = sin_a * (x - cx) + cos_a * (y - cy) + cy
    return new_x, new_y

# Main loop
running = True
while running:
    screen.fill(BLACK)

    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False

    # Apply gravity
    ball_velocity[1] += GRAVITY

    # Move ball
    ball_pos[0] += ball_velocity[0]
    ball_pos[1] += ball_velocity[1]

    # Compute rotated square corners
    half_size = square_size // 2
    square_corners = [
        (-half_size, -half_size), (half_size, -half_size),
        (half_size, half_size), (-half_size, half_size)
    ]
    rotated_corners = [rotate_point((square_center[0] + x, square_center[1] + y), square_center, angle)
                        for x, y in square_corners]

    # Check for collisions
    for i in range(4):
        p1, p2 = rotated_corners[i], rotated_corners[(i + 1) % 4]
        dx, dy = p2[0] - p1[0], p2[1] - p1[1]
        length = math.hypot(dx, dy)
        normal = (-dy / length, dx / length)  # Perpendicular normal

        # Distance from ball to edge
        projection = (ball_pos[0] - p1[0]) * normal[0] + (ball_pos[1] - p1[1]) * normal[1]
        if projection < BALL_RADIUS:  # Ball is colliding with edge
            # Reflect velocity
            dot = ball_velocity[0] * normal[0] + ball_velocity[1] * normal[1]
            ball_velocity[0] -= 2 * dot * normal[0]
            ball_velocity[1] -= 2 * dot * normal[1]

            # Apply bounce damping
            ball_velocity[0] *= BOUNCE_DAMPING
            ball_velocity[1] *= BOUNCE_DAMPING

            # Move ball out of collision
            ball_pos[0] += normal[0] * (BALL_RADIUS - projection)
            ball_pos[1] += normal[1] * (BALL_RADIUS - projection)

    # Rotate square
    angle += ROTATION_SPEED

    # Draw rotated square
    pygame.draw.polygon(screen, WHITE, rotated_corners, 2)

    # Draw ball
    pygame.draw.circle(screen, YELLOW, (int(ball_pos[0]), int(ball_pos[1])), BALL_RADIUS)

    # Update display
    pygame.display.flip()
    clock.tick(FPS)

pygame.quit()

Output:

2. Deepseek

DeepSeek’s output left me genuinely amazed. While ChatGPT was quick to generate code, DeepSeek took 200 seconds just to think about the problem. DeepSeek didn’t just write a functional script; it crafted a highly optimized, physics-accurate simulation that handled every edge case flawlessly

DeepSeek’s Code:

import pygame
import math

# Initialize Pygame
pygame.init()

# Constants
WIDTH, HEIGHT = 600, 600
SQUARE_SIZE = 400
SQUARE_HALF = SQUARE_SIZE // 2
BALL_RADIUS = 10
GRAVITY = 0.5
RESTITUTION = 0.8
ANGULAR_VELOCITY = 1  # Degrees per frame

# Setup display
screen = pygame.display.set_mode((WIDTH, HEIGHT))
pygame.display.set_caption("Bouncing Ball in Rotating Square")
clock = pygame.time.Clock()

# Initialize variables
square_center = (WIDTH//2, HEIGHT//2)
angle = 0

ball_pos = [WIDTH//2, HEIGHT//2]
ball_vel = [0, 0]

def rotate_point(point, center, angle):
    """Rotate a point around a center by given angle (degrees)"""
    theta = math.radians(-angle)
    ox, oy = center
    px, py = point

    qx = ox + math.cos(theta) * (px - ox) - math.sin(theta) * (py - oy)
    qy = oy + math.sin(theta) * (px - ox) + math.cos(theta) * (py - oy)
    return qx, qy

running = True
while running:
    dt = clock.tick(60) / 1000  # Delta time in seconds

    # Event handling
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False

    # Update square rotation
    angle = (angle + ANGULAR_VELOCITY) % 360

    # Apply physics
    ball_vel[1] += GRAVITY
    ball_pos[0] += ball_vel[0] * dt
    ball_pos[1] += ball_vel[1] * dt

    # Collision detection and response
    # Convert ball position to square's local coordinate system
    translated = (ball_pos[0] - square_center[0], 
                 ball_pos[1] - square_center[1])
    rotated = rotate_point((translated[0], translated[1]), (0, 0), -angle)

    collision = False
    new_local_x, new_local_y = rotated[0], rotated[1]

    # Check boundaries in local coordinates
    if rotated[0] < -SQUARE_HALF + BALL_RADIUS:
        new_local_x = -SQUARE_HALF + BALL_RADIUS
        collision = True
    elif rotated[0] > SQUARE_HALF - BALL_RADIUS:
        new_local_x = SQUARE_HALF - BALL_RADIUS
        collision = True

    if rotated[1] < -SQUARE_HALF + BALL_RADIUS:
        new_local_y = -SQUARE_HALF + BALL_RADIUS
        collision = True
    elif rotated[1] > SQUARE_HALF - BALL_RADIUS:
        new_local_y = SQUARE_HALF - BALL_RADIUS
        collision = True

    if collision:
        # Convert corrected position back to global coordinates
        corrected = rotate_point((new_local_x, new_local_y), (0, 0), angle)
        ball_pos[0] = corrected[0] + square_center[0]
        ball_pos[1] = corrected[1] + square_center[1]

        # Calculate reflection in local coordinates
        local_vel = rotate_point(ball_vel, (0, 0), -angle)

        if new_local_x != rotated[0]:  # X-axis collision
            local_vel = (-local_vel[0] * RESTITUTION, local_vel[1])
        if new_local_y != rotated[1]:  # Y-axis collision
            local_vel = (local_vel[0], -local_vel[1] * RESTITUTION)

        # Convert velocity back to global coordinates
        global_vel = rotate_point(local_vel, (0, 0), angle)
        ball_vel[0], ball_vel[1] = global_vel[0], global_vel[1]

    # Drawing
    screen.fill((0, 0, 0))

    # Draw rotating square
    square = pygame.Surface((SQUARE_SIZE, SQUARE_SIZE), pygame.SRCALPHA)
    pygame.draw.rect(square, (255, 0, 0), (0, 0, SQUARE_SIZE, SQUARE_SIZE), 2)
    rotated_square = pygame.transform.rotate(square, angle)
    screen.blit(rotated_square, rotated_square.get_rect(center=square_center))

    # Draw ball
    pygame.draw.circle(screen, (255, 255, 0), 
                      (int(ball_pos[0]), int(ball_pos[1])), BALL_RADIUS)

    pygame.display.flip()

pygame.quit()

Output:

3. Qwen’s Output: A Disappointing Attempt

If ChatGPT’s output was underwhelming, Qwen’s was downright disappointing. Given Qwen’s strong reputation for handling complex tasks, I really had high expectations for its performance. But when I ran its code for the rotating ball simulation, the results were far from what I expected. Like ChatGPT, Qwen generated code almost instantly — no deep thinking.

The ball was outside the box for most of the simulation, completely defying the laws of physics. The box itself was half out of frame, so only a portion of it was visible on the canvas

Qwen’s Code

import pygame
import math

# Initialize Pygame
pygame.init()

# Screen dimensions
WIDTH, HEIGHT = 600, 600
screen = pygame.display.set_mode((WIDTH, HEIGHT))
pygame.display.set_caption("Bouncing Ball in Rotating Square")

# Colors
YELLOW = (255, 255, 0)
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)

# Clock for controlling frame rate
clock = pygame.time.Clock()
FPS = 60

# Box properties
box_size = 300
box_center = (WIDTH // 2, HEIGHT // 2)
angle = 0  # Initial rotation angle
rotation_speed = 1  # Degrees per frame

# Ball properties
ball_radius = 10
ball_pos = [box_center[0], box_center[1] - box_size // 4]  # Start near the top center of the box
ball_velocity = [4, 0]  # Initial velocity (x, y)
gravity = 0.2  # Gravity effect

# Function to rotate a point around another point
def rotate_point(point, angle, origin):
    ox, oy = origin
    px, py = point
    qx = ox + math.cos(math.radians(angle)) * (px - ox) - math.sin(math.radians(angle)) * (py - oy)
    qy = oy + math.sin(math.radians(angle)) * (px - ox) + math.cos(math.radians(angle)) * (py - oy)
    return qx, qy

# Main game loop
running = True
while running:
    screen.fill(BLACK)

    # Handle events
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False

    # Update ball physics
    ball_velocity[1] += gravity  # Apply gravity
    ball_pos[0] += ball_velocity[0]
    ball_pos[1] += ball_velocity[1]

    # Rotate the box
    angle += rotation_speed
    if angle >= 360:
        angle -= 360

    # Define the four corners of the square
    corners = [
        (-box_size / 2, -box_size / 2),
        (box_size / 2, -box_size / 2),
        (box_size / 2, box_size / 2),
        (-box_size / 2, box_size / 2)
    ]

    # Rotate the corners around the center
    rotated_corners = [rotate_point(corner, angle, box_center) for corner in corners]

    # Check collision with the edges of the box
    collided = False
    for i in range(4):
        p1 = rotated_corners[i]
        p2 = rotated_corners[(i + 1) % 4]

        # Line segment between p1 and p2
        x1, y1 = p1
        x2, y2 = p2

        # Closest point on the line to the ball
        dx = x2 - x1
        dy = y2 - y1
        t = max(0, min(1, ((ball_pos[0] - x1) * dx + (ball_pos[1] - y1) * dy) / (dx * dx + dy * dy)))
        closest_x = x1 + t * dx
        closest_y = y1 + t * dy

        # Distance between ball and closest point
        distance_sq = (closest_x - ball_pos[0]) ** 2 + (closest_y - ball_pos[1]) ** 2
        if distance_sq <= ball_radius ** 2:
            # Collision detected
            collided = True

            # Normal vector of the edge
            edge_length = math.hypot(dx, dy)
            nx = -dy / edge_length
            ny = dx / edge_length

            # Reflect the velocity vector
            dot_product = ball_velocity[0] * nx + ball_velocity[1] * ny
            ball_velocity[0] -= 2 * dot_product * nx
            ball_velocity[1] -= 2 * dot_product * ny

    # Ensure the ball stays inside the box
    if not collided:
        # Clamp the ball position within the box
        ball_pos[0] = max(box_center[0] - box_size / 2 + ball_radius, min(ball_pos[0], box_center[0] + box_size / 2 - ball_radius))
        ball_pos[1] = max(box_center[1] - box_size / 2 + ball_radius, min(ball_pos[1], box_center[1] + box_size / 2 - ball_radius))

    # Draw the rotating square
    pygame.draw.polygon(screen, WHITE, rotated_corners, 2)

    # Draw the ball
    pygame.draw.circle(screen, YELLOW, (int(ball_pos[0]), int(ball_pos[1])), ball_radius)

    # Update the display
    pygame.display.flip()

    # Cap the frame rate
    clock.tick(FPS)

# Quit Pygame
pygame.quit()

Output:

2. Comparing ChatGPT, Qwen, and DeepSeek’s Responses to a Classic Pursuit Puzzle

When it comes to solving real-world problems, not all AI models are created equal. To test their capabilities, I presented a classic pursuit problem:

“A valuable artifact was stolen. The owner began pursuit after the thief had already fled 45 km. After traveling 160 km, the owner discovered the thief remained 18 km ahead. How many additional kilometers must the owner travel to catch the thief?”

1. ChatGPT’s Response

ChatGPT took 3 attempts to arrive at the correct answer. Initially, it misinterpreted the problem but eventually corrected itself, demonstrating persistence though lacking efficiency in its first tries

2. DeepSeek’s Response

DeepSeek also answered correctly on the first try but took slightly longer than Qwen. It delivered a detailed, step-by-step solution with clear reasoning, proving its strength in deep thinking and accuracy

2. Qwen’s Response

Qwen answered correctly on the first try and did so faster than DeepSeek. It provided a concise and accurate solution without unnecessary steps, showcasing strong problem-solving speed and precision.

Conclusion

While all three AIs eventually answered correctly, Qwen stood out for its speed and efficiency, while DeepSeek showcased its methodical approach. ChatGPT required multiple attempts

Humanizing AI Content: The Human Side of AI

While speed and efficiency are often celebrated in AI, the real game-changer is emotional intelligence — the ability to understand, interpret, and respond to human emotions. While AI models like DeepSeek excel in precision and logic, and ChatGPT shines in creativity. Let’s test it out

— — — — — — — —
Prompt: Write a messy emotional love letter
— — — — — — — —

Chatgpt:

Deepseek:

Qwen:

Interestingly, when tested for human-like originality, all three models — ChatGPT, DeepSeek, and Qwen — struggled to break free from their AI-generated patterns. Note: all three began their responses with the same robotic line: “I don’t even know where to start”. Any how I had high expectations with Chatgpt but Qwen won!

Key Takeaways:
DeepSeek: The go-to for research and critical thinking, outperforming others in precision and depth.
Qwen: Matched DeepSeek in solving the classic riddle on the first try and won in humanized content, making it a strong all-rounder.
ChatGPT: Took multiple tries to solve the riddle but remains a top choice for creative tasks and human-like interactions.

Final Verdict: Who Should Use Which AI?
Researchers: DeepSeek
Engineers: DeepSeek
Writers: ChatGPT or Qwen
Lawyers: Qwen withchatgpt
Educators: ChatGPT
Content Creators: Qwen and deep-thinking from Deepseek

Conclusion:

I would love to hear your take in the comments and correct me If wrong. If you found this article helpful, clap, share, and share your views. Feel free to support me here or UPI: jainhvj@fam

I’m excited to keep learning and exploring this vast field. I appreciate your feedback and look forward to insightful discussions

DEV Community

Benchmarking ChatGPT, Qwen, and DeepSeek on Real-World AI Tasks

Comparative Analysis of AI Model Capabilities:-

1. Chatgpt

2. Deepseek

3. Qwen

Testing Time: Comparing the 3 AI’s with Real-World Issues

1. Physics: The Rotating Ball Problem

Challenge: Simulate a ball bouncing inside a rotating box while obeying the laws of physics

Prompt:-

1. ChatGPT’s Output: Fast but Flawed

Chatgpt’s Code:

Output:

2. Deepseek

DeepSeek’s Code:

Output:

3. Qwen’s Output: A Disappointing Attempt

Qwen’s Code

Output:

2. Comparing ChatGPT, Qwen, and DeepSeek’s Responses to a Classic Pursuit Puzzle

1. ChatGPT’s Response

2. DeepSeek’s Response

2. Qwen’s Response

Conclusion

Humanizing AI Content: The Human Side of AI

Chatgpt:

Deepseek:

Qwen:

Conclusion:

Thank you!

Top comments (0)

Read next

AI translator

How Intel Arc GPUs Revolutionize Gaming and AI Industries?

It is now possible to generate 16 Megapixel (4096x4096) raw images with SANA 4K model using under 8GB VRAM and more

Identify Unknowns, Weaknesses, and Risks in AI