Denis Bratchikov

Posted on Mar 4

AI-Powered Code Refactoring: A Case Study Using Cursor with GPT-4o and Claude 3.7 Sonnet

#ai #refactoring #javascript #typescript

Introduction

Hey there! Code refactoring is one of those necessary parts of keeping software clean and maintainable. But let’s be honest—it can get pretty tedious, especially when dealing with repetitive tasks across a ton of files. In this case study, I’ll walk you through how I used Cursor IDE, powered by GPT-4o and Claude 3.7 Sonnet, to automate a refactoring task across 64 Playwright test specification files.

The goal? Remove deprecated arguments from function calls with minimal manual intervention. Let’s dive in!

Preconditions

Before we dive into the technical details, it's worth mentioning that Cursor was modified with specific rules from this github. These rules helped ensure that the AI would divide tasks into smaller ones and perform one-by-one.

Problem Statement

During a recent redesign, several test helper functions had been temporarily modified to include additional parameters. Specifically, our Playwright test suite contained:

Project initialization functions using:

page.goto(path, config)
page.createProject(config)
page.loadProject(project_name, config)

IConfig {
  featureFlags: {
    'feature-1': true,
    'feature-2': true,
    'new-ui-design': true // <= our first target
  },
  ...some_other_params
}

Snapshot verification using:

page.toHaveSnapshot(optional_name, params);

IParams {
  shouldClip: true, // <= our second target
  ...some_other_props
}

After the redesign, the default behavior of page.toHaveSnapshot() was restored, meaning shouldClip: true property was no longer needed. Similarly, new-ui-design: true was no longer relevant and needed to be removed from all test files.

Approach: AI-Assisted Batch Refactoring

To automate the changes, I used Cursor IDE with the Composer (Agent) mode, leveraging both GPT-4o and Claude 3.7 Sonnet for code modifications.

Initial Prompt (Single-Step Approach)

My first attempt was a single prompt to process all .spec.ts files at once:

In `application\e2e`, find all `.spec.ts` files and do the following:
1. Remove object with `shouldClip: true`
2. Remove the empty string before the removed object (if any)
3. Remove `'new-ui-design': true,` from the corresponding object
4. If the corresponding object (e.g., `featureFlags`) becomes empty, remove it
5. If the parent object of the corresponding object becomes empty, remove it

While the AI handled a lot of cases correctly, I ran into some hiccups:

TSometimes the model added unintended modifications, like changing snapshot names or adding more checks.
It only partially applied changes across different files (fixing one method but skipping another in the same file).
The results were not deterministic, varying on different runs.

Refining the Approach: Two-Step Refactoring

To make things smoother, I decided to split the refactoring into two separate tasks.

Prompt 1 (Snapshot Cleanup)

For each file `application\e2e\**\*.spec.ts`, do the following:
1. Remove object with `shouldClip: true`
2. If the corresponding object becomes empty, remove it
3. Remove the empty string before the removed object (if any)

Prompt 2 (Feature Flags Cleanup)

For each file `application\e2e\**\*.spec.ts`, do the following:
1. Remove `'new-ui-design': true,` from the corresponding object
2. If the corresponding object (e.g., `configureCustomFeatureFlags`) becomes empty, remove it
3. If the parent of the corresponding object becomes empty, remove it

Why This Worked Better:

Avoided unintended side effects from overlapping edits.
Ensured consistency across all files.
Reduced hallucinations, since each task was clearer and more focused.

Code update examples

Refactoring Example 1, Before:

page.createProject({
  featureFlags: {
    'new-ui-design': true
  }
})

page.loadProject('my-project', {
  featureFlags: {
    'new-ui-design': true,
    'some-feature': true,
  }
})

page.goto('/', {
  featureFlags: {
    'new-ui-design': true
  },
  foo: {
    bar: 'baz'
  }
})

Refactoring Example 1, After:

page.createProject()

page.loadProject('my-project', {
  featureFlags: {
    'some-feature': true,
  }
})

page.goto('/', {
  foo: {
    bar: 'baz'
  }
})

Refactoring Example 2, Before::

const snapshotParams = {
  threshold: 0.6,
  shouldClip: true
};

page.toHaveSnapshot('', snapshotParams);

page.toHaveSnapshot('my-snapshot', { shouldClip: true });

page.toHaveSnapshot(undefined, { shouldClip: true });

Refactoring Example 2, After:

const snapshotParams = {
  threshold: 0.6
};

page.toHaveSnapshot('', snapshotParams);

page.toHaveSnapshot('my-snapshot');

page.toHaveSnapshot();

Model Comparison: GPT-4o vs. Claude 3.7 Sonnet

When using both GPT-4o and Claude 3.7 Sonnet for this task, I found that there was no significant difference in their performance or accuracy for these specific refactoring tasks. Both models struggled with the one-step approach and were able to process the files, making the changes I needed without introducing major differences in output quality with the two-step approach.

Challenges and AI Limitations

Using AI for this task was pretty great overall, but it wasn’t without its quirks:

Task Execution Limit - Cursor’s Composer has a default task limit of 25, after which manual confirmation (proceed / continue) is required.
AI Deviations & Over-Eagerness - in some runs, the model attempted to generate a Python script to automate the task despite explicit instructions to modify the files directly.

Key Takeaways

💡 AI is already a powerful tool for large-scale code modifications, allowing for efficient batch refactoring with minimal manual oversight.

✅ What worked well:

Using Cursor IDE’s Composer mode with well-defined prompts.
Splitting complex refactoring into smaller, focused tasks.
Leveraging both GPT-4o and Claude 3.7 Sonnet to compare performance.

❌ Challenges:

AI still struggles with perfect consistency across multiple files.
AI still struggles with following all instructions

👨‍💻 Would I use AI for similar tasks again? Absolutely. While AI isn't perfect, it's already a valuable assistant for handling tedious, repetitive refactoring tasks—giving developers more time to focus on higher-level problem-solving.

Final Thoughts

This was my first adventure with AI-assisted batch refactoring, and I’d love to hear what you think! Have you ever used AI for similar tasks? Let’s chat about it in the comments. 👇

Feel free to connect with me on LinkedIn

🚀 More content coming soon - stay tuned!

P.S. This is my first article on development, so I’d really appreciate any feedback or suggestions! Thanks for reading! 😊

DEV Community

AI-Powered Code Refactoring: A Case Study Using Cursor with GPT-4o and Claude 3.7 Sonnet

Introduction

Preconditions

Problem Statement

Approach: AI-Assisted Batch Refactoring

Initial Prompt (Single-Step Approach)

Refining the Approach: Two-Step Refactoring

Code update examples

Model Comparison: GPT-4o vs. Claude 3.7 Sonnet

Challenges and AI Limitations

Key Takeaways

Final Thoughts

Top comments (0)

Read next

Are You Using "use client" Wrong in Next.js?

👨‍🎓A Quick Start Guide to RegEx in JavaScript

Vector Search Demystified: A Guide to pgvector, IVFFlat, and HNSW

🚀 Getting Started with Deepgram Nova-3 for Real-Time Speech-to-Text