This article will cover how GenAI could be leveraged for Code reviews.
Prerequisites ✅
- Basic understanding of Python
- Access to a GPU-based machine (*optional)
- Trauma from endless Code reviews (*helps to appreciate the idea)
🤔 Understanding the Problem
Code reviews: the necessary evil we all love to hate. Every developer has experienced the pain of endlessly checking code for inconsistencies, missed standards, or overlooked best practices.
Having worked at over six tech companies you start to see patterns. One common problem I've noticed is:
"Code reviews are time-consuming".
We all can agree that if code compiles, generates the desired output, and passes all the TCs. It still ain't enough to push to Production. Else CI/CD Pipelines would suffice.
So we can agree even after the various stages of a pipeline. There is an inherent need of a Human intervention. A Developer who comments on the holier-than-thou words of ✨"LGTM ✅"✨ to an MR. A person who'll be held more accountable for these changes than the developer itself!
As we all know human intervention means chances of human error.
What if AI can take care of it at some level?
AI can't replace developers but can surely assist them!
💡 Solution
In my experience, there are two types of coding standards and rules that a Developer follows:
- Rules written in books
- Rules your Seniors follow (thus the team follow)
What if we document all of these rules the entire team decides to follow? (takes what 30 mins max?)
Now whenever a new code is written. It is checked by AI against all of these rules.
A few examples of such rules:
- Logging structure to be consistent across the repos.
- Fail-over approach for network calls
- Even naming of variables (camelCase or kebab-case)
- Error codes
- Cases of panic
I guess this will be enough for you to get a sense of why these rules need to be written (even when no AI).
In my project, the AI sends something like this as a Code Review: See Logs here
Now if you feel the need let's dig into the implementation!
🛠️ Implementation
Talk is cheap, show me the code : Here you go!
In the project link shared above, I tried to implement the idea as a stage of the CI/CD pipeline itself. There are other ways to implement the idea, which will be discussed later.
Let's go step-by-step :
1. RULES.md 📖
As mentioned before, you need to mention all of the rules against which you want your code to be tested in one place. It could be a markdown file or a .txt file. Doesn't matter till it's accessible easily by everyone.
2. Right Machine 🖥️
(* Optional step)
Ensure you have a GPU-based machine. Or any machine with enough RAM to run a LLM well! Since I am broke, you can see I'm stealing the GPU resource provided by GitLab CI/CD Runners.
3. Choose a LLM provider 🔌
There are multiple ways to interact with LLM in 2024!
API Method
There are APIs available by Google like AI Studio or Vertex AI, Open AI has its API, Microsoft has a few offerings.
If you decide to use an externally hosted LLM then the right Machine is not necessary! At the end of the day, it's an API call.Ollama
Coming from a DevOps Engineer, I'm trained to think of cloud agonistic solutions. So if you have a machine(or VM) at your disposal. You can look into Ollama. It has been one of my fav dev tools for the past few months. It can run an LLM on your machine and expose an API endpoint for interacting with it as well!
4. Choose the Right LLM 🧐
The response from every LLM model can be different. Other factors that should be considered are: response time, context length, size of the model, and more. If you are looking towards fine-tuning LLMs, be my guest and go crazy!
Mainly you gotta hit and try.
For the Ollama approach, you can check the available LLMs: Here
5. Perfect Prompt 🪄
The poison in working with LLM in any aspect of writing a perfect prompt. It's a hit-and-miss, to be honest. But more importantly, you need to prioritize what is the least minimum you need from the response.
In my experience, make sure you escape the string you pass to any LLM. Don't forget to play with the temperature for clear answers.
6. Script 🐍
I tend to use Python for scripting anything on the application level. You can use any other language as well.
We write a script to:
- Read the rules from RULES.md file
- Check the changes made to the files of your interest
- Send a prompt to the LLm over an API call (Ollama or cloud)
- Print the response in case of success
To clench your thirst of curiosity, kindly see this Python File
Scope of the idea 🔍
I have kept it as a stage in the CI/CD Pipeline to ensure GPU-based machines are easily utilized. For better optimizations, you can change the point when the workflow is triggered. You can run it only when MR is for the master branch instead of for each commit.
It could be a CRON Job as well.
AI Code review on your local machine
It would be great if this idea is executed on local machines rather than CI/CD pipelines.
Now almost all the new Laptops are capable enough to run an LLM on Ollama. Taking the advice from Arnav, why not utilize the power sitting literally at your fingertips?
You can run Ollama on your machine. Serve the model on a port.
Whenever you build your project locally, trigger the Python script to utilize the port and have a review once the code compiles!
Limitations 😞
NO, it can't abolish Human Code reviews. This will just assist Developers in doing code reviews faster.
The right LLM, Prompt, and set of rules will be refined only after multiple iterations.
Everything we work on as an engineer is a Tradeoff! Resources and time usually drive apart, but both indirectly translate to money.
Reliable answers might not be always achieved. Hallucinations is a topic out of the scope of this article
IF you use the API approach of a third party, please do understand you will be sending your proprietary code to that service! (Ollama FTW!)
My Take on the Issue 😸
Code reviews are the most important job a Developer does! AI or no AI it will remain the same. With this idea, you can not only assist the reviewer but also the Developer making the changes. Developers can see for themselves the scope of improvement in their code.
Something like this will ensure the code repo has common rules followed across the team. Makes it easier for a new dev to get on board.
If you liked this content you can follow me here or on Twitter at kitarp29 for more!
Thanks for reading my article :)
Top comments (5)
Great article! I've actually built a tool around this use case (code review for solo devs) and made it available as a web app
Looks great Man!
I like your balanced views on AI in Code Reviews. Most AI systems involve some sort of tradeoff, but there's no denying they are a huge help.
There is always gonna be a human-in-the-loop. I hope the article helps :)
Good one, Pratik! 👏