Jonathan Vila

Posted on Feb 18

Code Reviews with AI: a Developer Guide

#programming #ai #codequality #security

Code reviews are a cornerstone of software development. They're where we share knowledge, catch bugs early, and ensure our code meets the highest standards.

But let's be honest...

Traditional code reviews can be time-consuming and tedious and sometimes even miss subtle yet critical issues. Enter the age of AI-powered code review, a game-changer that addresses these challenges and elevates code quality to new heights.

This article dives into the common pitfalls of code reviews and explores how AI tools can revolutionize each phase of the development lifecycle.

I will discuss the use and impact of AI on the different phases of the SDLC from the 2 main perspectives of the developer and the reviewer. I will also talk specifically about the use of AI in the Code Review and how to implement it productively. I will provide examples of tools used in the different phases: Github Copilot, SonarQube, Qodo, and IntelliJ.

Code generated by AI code assistants

AI-powered generative code assistants take the power of AI even further by automatically generating code based on your inputs. This can dramatically reduce the time and effort required to write code, especially for repetitive or boilerplate tasks.

Generative code assistants can also help you explore different design options and identify potential problems before you start coding. By leveraging these tools, you can focus on the creative and strategic aspects of software development, while the AI handles the tedious and mechanical tasks.

AI adoption

There’s a long list of AI code assistants providing different features, with different ranking rates considering 5 different categorizations:

https://research.aimultiple.com/ai-coding-benchmark/

While these tools are powerful and feature-rich, they rely on models hosted somewhere and there is a price involved in some of the features.

The local free open-source approach ....

Other completely open-source options are also available. This option involves hosting the model to generate code locally or in your network. There are tons of free and open-source models that you can use, and you can only serve those models by installing the free tool Ollama on a machine in your network or locally.

I’ve tried with the IntelliJ plugin “Continue”, Ollama, and the models “codellama” and ”deepseek-coder” and the experience was not bad at all. With this solution also you are sure your code, and your prompts are not going anywhere out of your domains.

But, every magic comes with a price.

While generative AI holds immense promise, it is not without its pitfalls. One major concern is the potential for introducing bugs and vulnerabilities into code. AI models are trained on vast amounts of data, and if this data contains errors or malicious code, the generated code may inherit these flaws.

AI-generated code correctness

Additionally, generative AI systems may not fully understand the context or intent of the code they generate, leading to nonsensical or even harmful output. Furthermore, there is a risk that generative AI could not use the full code base context in order to generate the most aligned code with our current content. It is crucial for developers to carefully review and test code generated by AI and to employ robust security measures to mitigate these risks.

With this panorama, it’s clear that using AI to generate code will impact positively the speed, but also negatively in the full SDLC and more importantly in the code review process.

Let’s focus on the traditional code review process and its pain points.

The Traditional Code Review Struggle: Familiar Pain Points

We've all been there. Traditional code reviews, while valuable, often suffer from:
● Time Consumption: Manually reviewing every line of code is a significant time investment, especially for large projects.
● Subjectivity: "Good code" can be subjective, leading to inconsistencies in feedback and potential disagreements.
● Missed Issues: Even the most experienced human reviewers can miss subtle bugs, security vulnerabilities, or performance bottlenecks. We’ve seen from above how code assistants can
impact here.
● Focus on Style: Too much emphasis on minor stylistic issues can distract from more critical problems.
● Lack of Context: Reviewers may lack the full context of the code changes, making it harder to provide effective feedback.
● High Cognitive Load: Reviewing large pull requests with hundreds of lines of code can overwhelm even the most experienced developers.
● Delayed Feedback: Waiting for a code review can slow the development pipeline, impacting delivery timelines.
● Team friction: Code reviews can lead to team friction when simple issues are overlooked due to a lack of context, when subjective feedback occurs, or when poor feature testing occurs, potentially escalating into disagreements.

AI to the Rescue: Enhancing Code Reviews

AI-powered tools are transforming code reviews by automating tedious tasks, providing objective feedback, and uncovering hidden issues. Let's explore how these tools can assist throughout the development lifecycle:

1. Development Phase (IDE Integration)

We’ve seen that several code assistant plugins and IDEs can help us generate code. Several benchmarks (huggingface, stackeval, Mike Ravkine’s) could help us choose the model to use in those assistants.

So, we have the proper tools for auto-completion and code generation, but what about verifying that the code generated does not introduce issues, vulnerabilities, or solutions that are not well suited for the language version of the code base context?

For this task, there are static analyzers in the form of IDE linters, like SonarQube for IDE, that will provide instant feedback as you write code, or CI/CD code analyzers, like SonarQube Server/Cloud, which will do a full analysis of your code and prevent or allow it to be merged.

Imagine catching potential bugs and best practices before they even make it to a code review.

These linters use static analysis, and different functions, to detect code smells, bugs, and security vulnerabilities directly in your IDE, empowering you to write cleaner and safer code from
the start.

However, this includes not only bugs and vulnerabilities but also best practices for using certain frameworks or language versions. AI code assistants sometimes are not well aware of the language version that you are using (e.g., Java 21) in your codebase, and the solutions they suggest do not consider the latest improvements in the language, just simply the most used approaches.

In this case, GitHub Copilot didn’t suggest an approach using a feature introduced 7 years ago.

Generated by Github Copilot

public double calculateAverage(Collection<Integer> collection) { 
   int sum = 0; 
   for (Integer num : collection) { 
       sum += num; 
   } 
   return (double) sum / collection.size(); 
}

Manual approach considering Java's new Teeing collector, introduced in Java 12, using the consisted and language level approach to iterate collections and lazily compute values from it.

public double calculateAverageManual(Collection<Integer> collection) { 
   return collection.stream().collect( 
Collectors.teeing( 
             Collectors.summingDouble(i -> i), 
             Collectors.counting(), 
             (sum, count) -> sum / count) 
); 
}

or even not using the latest new features of a language. In this case, Virtual Threads were introduced in Java 21, a year and a half ago.

Code generated by Github Copilot, using platform threads

new Thread(() -> { 
   var url = new URI("http://localhost:4000").toURL(); 
   var connection = (HttpURLConnection) url.openConnection(); 
   connection.setRequestMethod("GET"); 
   int responseCode = connection.getResponseCode(); 
}).start();

Manual approach using Virtual Threads, being able to create thousands of threads that will increase the performance dramatically in blocking operations.

Thread.ofVirtual().start(() -> { 
   var url = new URI("http://localhost:4000").toURL(); 
   var connection = (HttpURLConnection) url.openConnection(); 
   connection.setRequestMethod("GET"); 
   int responseCode = connection.getResponseCode(); 
}).start();

Luckily these linters will also warn us about the lack of best practices usage while we code and during the CI full analysis.

A particular benefit of some linters over others (like SonarQube IDE) is that they can analyze multiple types of files at the same time in the same project. This is not only restricted to programming languages like Java, Python, JScript, Kotlin, etc. but also to Cloud deployment files like Docker, Kubernetes, Ansible, Terraform, CloudFormation, etc., and even Secrets vulnerabilities.

2. Test generation Phase

AI can analyze current code, try to understand its purpose, and generate test methods that will potentially generate the test code, ensuring high coverage. This helps to ensure that your code is thoroughly tested before it is released.

In this area, we can find tools like Qodo Gen, among others, that specialize in test generation.

I’ve installed it in my IntelliJ IDE and tried it with my AI project. The result is impressive, considering several test use cases in the happy path or edge cases. As with most code assistants, we can select which remote-hosted model we want to use.

Tools like Qodo will take a class method and create its tests. We will have a dashboard to see the tests and executions, and also a plan for the test generation in the Qodo plugin :

3. Pull Request creation

The process of creating a Pull Request is also important in order to give the proper context and details to those who will review it.

A typical workflow would usually imply:

● Follow the initial process of joining/sign-in a team
● Read the contribution guidelines
● Do the commits following a convention
● Sign all the commits (please!)
● Create a draft pull request with a good and complete description (sometimes following a template e.g. JKube project)
● Wait for all the checks to pass
● Change the PR status to ready to review.

Several guides (GitHub, pull request) can help you write good descriptions, but we can leverage AI for this. One of the tools we can use for this is Github Copilot, which analyzes the code in the PR to provide a more detailed description.

In these two images, we see how we can ask Github Copilot to generate a summary for the PR.

We can also expand this with all the details we think can add more context and value and help reviewers in their tasks.

4. Pre-Code Review (Automated Analysis)

Static analyzers (like SonarQube) perform in-depth static analysis on your codebase, identifying issues that might be missed by human reviewers. This goes beyond style checks and delves into:

● Bug Detection: Identifying potential NullPointerExceptions, logic errors, and other bugs.
● Security Vulnerabilities: Detecting potential injection attacks, cross-site scripting (XSS) vulnerabilities, and other security risks.
● Code Smells: Highlighting code that is difficult to read, maintain, or understand. For example, overly complex methods or duplicated code.
● Code Coverage: Measuring the percentage of code covered by unit tests, helping to ensure comprehensive testing.

These analyzers present these issues clearly and actionably, prioritizing them based on severity.

This allows developers to focus on the most critical problems first, making the review process more efficient and effective.
Connecting this step with the previous PR workflow, the tools we connect to our repository can help us to check for all the scenarios that can make our code fail, before anyone invests time in reviewing it, to just focus on working changes that need experienced review.

5. Pull Request changes explanation

Now it’s the turn of the reviewers. They should start by reading the ticket that defines the PR's goal. After that, a careful review of the checks' status will give us an idea of whether the changes are okay to be merged.

For this, we can use several tools. I’ve tried Github Copilot and Qodo PR-Agent (you can find a comparison here) :

While Copilot has an explanation feature per changed file, Qodo can create a description for the entire PR. This will help the reviewers understand the details applied to files and focus on those that require more attention. It’s important to reduce the time a PR needs to be merged, and definitely, the usage of AI tools can help us with that.

6. Changes suggestion

In some PRs, reviewers make suggestions to improve or fix parts of the document. This is a crucial point that, if not done correctly, can add anxiety and friction among team members.

There are some guidelines in order to have safe and productive communication between the author and reviewers.

Some AI tools, like Qodo PR-Agent, can also implement the improvements and changes suggested by the AI agent directly from the PR review to the code.

Like all the changes, they need to be analyzed and checked with tools like SonarQube. If there are any issues, these tools will fail, preventing those changes from being merged into the main branch.

Addressing the Challenges with AI

Here's how AI tackles the traditional code review challenges:

● Reduced Time: Automation frees up developers to focus on more complex and creative tasks. Also helps reduce the cognitive load for the review process needed to understand the scope of all the changes.
● Increased Objectivity: Static analysis provides objective, consistent, and deterministic feedback based on predefined rules and best practices.
● Focus on Critical Issues: Prioritization helps reviewers focus on the most important problems.
● Enhanced Context: AI can read the changes and generate meaningful PR descriptions
along with detailed explanations of the changes for the reviewers.

Conclusion

AI is not replacing human reviewers; it's empowering them. By automating tedious tasks and providing valuable insights, AI tools will improve the speed and comprehension of the generated code.

Tools like static analyzers will automatically check code compliance and guarantee code quality throughout the SDLC, allowing developers to focus on what they do best: designing, building, and innovating.

By leveraging AI-powered tools, we can shift our focus from simply finding bugs to building high-quality, maintainable, and secure software.