While writing some MVP seems to easy using modern AI tools, working on real world project remains highly senior task.
Lets see how Thinking models like O1 and Gemini-thinking can be useful on such complex tasks as analysis and refactoring of real world project
Why doing refactoring with AI?
For AI-developed projects: While Cursor Composer can create whole features, it tends to forget about the bigger picture, such as architecture and system design. This article shows how thinking models can be employed to address this weakness by splitting refactoring into a list of smaller tasks that Composer should be able to handle.
For regular projects: AI can do a full project code review in seconds and provide meaningful improvement suggestions from the 3-D person perspective. It can act as a senior team member who will explain his reasoning, answer your questions, and propose solutions.
Let's do it
1. Selecting Model to Use:
What are we looking for from our model?
- Large context window, so you can feed your entire code base or a big chunk of it
- Thinking capabilities. Ideally, we want it to be able to understand our project
- It is a single-time job, so we can employ the best available staff, and it should not be that expensive
In my case, I would use O1 and Gemini-2.0-flash-thinking-exp. They are the best available on the market and are also integrated into Cursor.
2. Selecting project
I’ve randomly selected screenshot-to-code. While it is definitely not a linux repo, it is old enough to already face tech debts and far more complex than those MVP projects people are building with AI.
git clone https://github.com/abi/screenshot-to-code
3. Generating context for model:
We want to feed the entire project to our model as context. The simplest wait is to concatenate everything into one file, and there is already one for that code2prompt.
pip install code2prompt
code2prompt --path ./screenshot-to-code --output project_summary.md
Output will look like project_summary.md
4. Checking context size
We want to verify if our model can read such a large context file.
pip install token-count
token-count --file project_summary.md
For me it says 332387 tokens, when O1 can read 200k tokens and Gemini 1M tokens.
5. What if code base is too big?
I would not say that it is easy to work with big projects using AI, and I haven’t seen it done by somebody else, but there are at least 2 things I would try:
Summarize code first: A very well-known approach in data analysis that works when applied correctly. Why not replace implementation with just API documentation or employ another AI to give a short explanation?
Per module analysis: As engineers, we already invented many ways to work with complex systems: application layers, feature modules, libraries, etc. How about defining the expected system design and checking if a module fits it?
5. Prompt engineering
Prompt engineering is another beast. You can check some open source. But I would suggest using your 🧠 to think and explain what you want as a result and what specific context you know about the project. AI can do a lot, but it can not read your mind
After playing a bit, I've come up with this one:
Here's our current code base @project\_summary.md. Can you propose improvements or a refactoring plan? Give me a bullet list of such improvements with priorities (High/Medium/Low) and a short explanation of why this improvement is needed and what has to be done for each item.
6. Generating refactoring plan
I've used a new Cursor chat for each model with the same prompt and saved responses to files like *_suggestions.md.
Reviewing Results:
Below is a table of summarized improvements and my subjective ranking
Improvements | O1 (Priority) | Gemini-2.0-flash-thinking-exp (Priority) | Subjective Ranking |
---|---|---|---|
Centralize Configuration Files | ❌ | ✅ (High) | (Medium) Makes sense, Configs will be easier to support in the future |
Improve Component Organization in Frontend | ✅ (High) | ✅ (High) | (High) Definitely worth doing, it will only become worth in the future |
Standardize Naming Conventions / Linter Rules | ✅ (Medium) | ✅ (Medium) | (High) A good linter should be able to solve it |
Group Backend Routes | ❌ | ✅ (Medium) | (Medium-Low) Only if we are planning to add more routes |
Review Utility and Helper Functions | ✅ (Medium) | ✅ (Low) | (Medium) Why are there 2 places for utility functions at all? |
Consolidate Test Directories | ✅ (Low) | ✅ (Low) | (Medium) Definitely YES |
Add Consistent Error Handling | ✅ (High) | ❌ | (Medium) Good point, especially if we are aiming for good code quality and not just a demo |
Improve TypeScript Strictness | ✅ (High) | ❌ | (Low) Up to the taste of developers. TypeScript strictness is kind a painful already for some people |
Extract Reusable Layout Components | ✅ (Medium) | ❌ | (Medium) Sounds good, but need to take a look at what can be reused |
Improve Comments and Documentation | ✅ (Low) | ❌ | (Medium) It is always a trade-off between development speed and documentation, if there was a way to employ AI to do it |
Optimize for Performance Where Relevant | ✅ (Low) | ❌ | (Low) Only spend time on it if there is a clear bottleneck |
Enhance Testing Coverage | ✅ (Low) | ❌ | (Medium) Will help to reach product maturity. Maybe let AI do it also, never seen a developer willing to add tests when a feature is already working |
How viable are those suggestions?
Pretty much almost all of them make sense (at least for me) and are worth considering depending on my project needs.
Was it able to understand the project?
Hard to say for sure, even the amount of suggestions is different. Both models highlighted the same 4 areas of improvement:
(High) Component Organization in Frontend
(Medium/High) Standardize Naming Conventions Linter Rules
(Medium/Low) Review Utility and Helper Functions
(Low) Consolidate Test Directories
So there is definitely some consistency and depth of understanding of the project.
Also, I guarantee you that if two developers were asked the same question, they would have two quite different lists. In any case, you will need to consolidate those answers and prioritize them based on team priorities, workload, and upcoming goals.
Conclusion
So, did I get what I needed? The answer is largely yes: I gained a clear, prioritized refactoring roadmap and saved time scoping out big changes.
Refactoring remains an iterative process grounded in your team’s experience, code reviews, and business goals. In that sense, AI isn’t a silver bullet—it’s a catalyst.
Like an experienced team member, 🤖✨ can offer an outside perspective, explain his reasoning, and convert them to action plans.
Use it wisely, and you’ll find yourself more in control of your codebase than ever before.
Top comments (0)