An enormous amount of server-side code is written in Java. So, web applications written in this language must be resistant to certain security vulnerabilities. This short article is about one of the ways to fight them, which is SAST. It's also about what taint analysis is and what role it plays here.
What the article is about
In our Java analyzer, we've implemented mechanisms that enable us to create diagnostic rules for detecting tainted data that has entered the program from an external source. The PVS-Studio analyzers for C++ and C# have been capable of doing this for a long time, and now it's time for Java.
In this article, without getting too technical, I'll explain what detecting tainted (potentially dangerous) data is and why it's cool to have this feature. However, if you want to explore this topic from a theoretical point of view, our Java Team Lead has written an article about it. If you're interested, I recommend reading it.
Here though, we'll talk about potential vulnerabilities. We'll focus on ten most common web application vulnerability types.
About vulnerabilities
A vulnerability in an application is a flaw that can be exploited to disrupt its operation.
Developers use many different tests to detect them. Leveraging such tests to secure an application is the right and necessary thing to do, but devs usually do it at the testing stage. Although what if I told you it's possible to find vulnerabilities earlier?
This is where we come to the concept of SAST.
SAST
SAST (Static Application Security Testing) is a testing methodology where program code is analyzed for potential vulnerabilities.
Potential vulnerabilities are defined as "defects" in code. They're called so because, under certain circumstances, attackers can use them to disrupt the software operation.
If they find ways to exploit a potential vulnerability and do so, it becomes a legitimate vulnerability.
The SAST software focuses on detecting potential vulnerabilities. One of the key benefits of using SAST is that it detects vulnerabilities at the development stage.
Let me tell you why this is a good thing.
SAST and making development cheaper
We often attend different conferences on the matter. And if the talk topic concerns static analysis, we discuss an important benefit of using it—a significant reduction in development costs, or rather, the cost of fixing an error. However, since the two are closely related, I took the liberty of generalizing them.
What makes the cost go down?
The cost of fixing an error has been brought up by NIST (National Institute of Standards and Technology). In their study, they looked at how the cost of fixing a vulnerability increased depending on what stage of software development it was discovered.
As you can see, the growth is exponential, which makes sense.
Fixing vulnerabilities after the release is expensive—it requires developers to get distracted from their current tasks and go back to old code, which takes time and resources.
Even worse, if the vulnerability enables manipulation of internal software data so that attackers can delete it. Imagine the damage to the company if they just took the database down or made it public. Also, in addition to financial losses, one should never forget about reputational risks.
At the development stage, troubleshooting takes less time and reduces costs.
There are so many different vulnerabilities these days. Which ones should a SAST solution detect? "The ones that applications are most often exposed to," sounds like a pretty good point. I hope you feel the same way.
OWASP and its Top Ten
OWASP (Open Worldwide Application Security Project) is a non-profit foundation dedicated to improving software security. Since 2004, it has been registered in the United States as a non-profit charitable organization.
The foundation releases a great deal of content on the topic of secure software development. It's all publicly available. The foundation also holds conferences and meetings and supports open-source development.
The OWASP Top Ten is an OWASP project that reports on the most critical vulnerabilities in web applications. It's a ranking, where each position combines a group of similar vulnerabilities. The order in which they are listed depends on a number of factors: potential damage, frequency of occurrence, and how easy it is for attackers to exploit them.
The OWASP Top Ten is unique, as it's based on actually detected vulnerabilities. Information on them is collected every few years and comes from the following sources:
- information security specialists and consultants;
- Bug Bounty, a platform that provides financial rewards for catching vulnerabilities in real-world applications;
- web application development companies.
Since the top is based on real vulnerabilities, it reflects information security trends as accurately as possible. Over 500,000 real-world projects have been analyzed to compile the currently relevant OWASP Top Ten 2021. This is why companies interested in fighting vulnerabilities in their web applications put a lot of trust in it.
According to the OWASP Top Ten 2021 press release, over the years, the top has become a "pseudo-standard" representing a basic norm that must be met. It's hard to argue with that—the vast majority of SAST solutions are centered around it.
We have a page where we regularly note which of our diagnostic rules meet the OWASP Top Ten categories.
Let's take a closer look at vulnerabilities
We'll examine some vulnerabilities from the OWASP Top Ten list.
For example, here's an SQL injection. After all, the diagnostic rule in our Java analyzer pointed it out :)
The SQL injection is a vulnerability that enables an attacker to inject their code into a query sent to a database. This enables the attacker to manipulate the data it contains and, in special cases, to violate its privacy.
This vulnerability can be exploited if a user input is used directly in a query and isn't preprocessed or validated in any way.
Let's look at an example. We have a website that contains articles. It has a search form that allows users to easily search for publications by title. Searching by name goes like this:
- a user enters a title for an article;
- the title is sent to the server;
- the title is merged with the database query;
- the query is sent to the database;
- the result based on the query is returned from there;
- the server returns the result to the user.
If external data is incorrectly merged with a query, malicious code can be injected into it.
If you'd like to see an example where an SQL injection is possible, the Java code with explanatory notes is in the next section.
A small example of an SQL injection
Let's take a look at the following code:
@Controller("/demo")
public class DemoController {
@Autowired
private JdbcTemplate jdbcTemplate;
@GetMapping("/demo_get")
public Optional<DemoObject> demoEndpoint(
@RequestParam("param") String param)
{
Optional<DemoObject> demoObj = demoExecute(param);
return demoObj;
}
private Optional<DemoObject> demoExecute(String name) {
String sql = "SELECT * FROM DEMO_TABLE WHERE field = '" + name + "'";
DemoObject result = jdbcTemplate.queryForObject(sql, DemoObject.class);
return Optional.ofNullable(result);
}
}
The example is synthetic, but it reflects a very real problem: unverified and unprocessed external data is used in the query. This can happen in real-world projects as well.
Now, let's get into details.
We get the external param
string. It's passed to the demoExecute
method. The query to the database is formed there—in the sql
string. The sql
string is added to an externally received parameter. The sql
string is then passed to the queryForObject
method, which executes it.
If the external data comes in the way developers want it to, there won't be any issues. However, once a user passes on something specific, things get a bit sad.
What kind of "specific" are we talking about? Well, here's an example:
' drop table DEMO_TABLE; --
In this case, the sql
string will look like this:
SELECT * FROM DEMO_TABLE WHERE field = '' drop table DEMO_TABLE; --'
What will happen? Our DEMO_TABLE
table will be deleted from the database.
To avoid such a big loss, we need to validate/clean the external data before creating the query.
How can we do that? For example, we can remove unnecessary external characters from the string:
name = name.replaceAll("[;'%\"--]", "");
However, one should keep in mind that it's impossible to completely prevent SQL injections this way. It's better to use parameterized queries:
private Optional<DemoObject> demoExecute(String name) {
String sql = "SELECT * FROM DEMO_TABLE WHERE field = ?";
DemoObject result = jdbcTemplate.queryForObject(
sql,
new Object[]{name},
DemoObject.class
);
return Optional.ofNullable(result);
}
In this case, all data passed in the request is handled as separate parameters. So, no matter what the user passes, it won't appear as the SQL command.
I used common vocabulary and wording to explain this example. However, there's specialized terminology for the topic:
- The
param
variable is** tainted data.** - The
demoEndpoint
method is a source of tainted data. - A method tainted data can potentially damage is a sink. In our case, this is the
queryForObject
method. - Data validation/cleaning is sanitization.
By the way, the analyzer issues the following for the given code fragment:
V5309 Possible SQL injection. Potentially tainted data in the 'sql' variable is used to create SQL command. DemoController.java 28, 20
From code to conclusions
If we summarize the vulnerability concept, it looks like this: when entering a certain part of the program unverified external data can disrupt its execution.
Does only an SQL injection work this way? No, there are many vulnerabilities that have this pattern. Here's just a few of them:
And many more. They're all part of the OWASP Top Ten.
How to catch them?
A static analyzer must learn to detect them. How to implement it? I'll highlight the main points we're interested in:
- Sources indicate where the external data comes from.
- Sinks are key points where tainted data can do damage;
- Sanitization is the process where external data is cleaned or validated;
- Interprocedural analysis helps see what happens to the data as it moves from one method to another, as in the example I showed above.
We need to trace how the data coming from the sources travels along the program. If it has entered the sinks unsanitized, report it.
This is where taint analysis comes into play.
Want to delve deeper into the topic?
We have an article that goes into more detail about the implementation of the issues described above. I encourage you to read it if you find it interesting.
Hello, Taint
The process I described above is called taint analysis.
Our big brothers, the C# and C++ analyzers, have been doing this for a long time. So, we decided to keep up with them. To make it possible, we fixed the old mechanisms in addition to implementing new ones. To add new diagnostic rules, we still need to tinker with our data-flow and many other things. Anyway, the foundation has been laid, and we're excited about it.
Taint analysis will enable us to move forward in covering OWASP vulnerabilities, which is what we'll be doing over the next few releases.
Summary
The implementation of taint analysis in the PVS-Studio Java analyzer has brought us closer to being a full-fledged SAST solution. We've already started developing diagnostic rules aimed at covering vulnerabilities from the OWASP Top Ten classification (and beyond). Each release will bring more and more.
Here's the link where you can get the PVS-Studio analyzer to try it out.
It's time to say goodbye. See you soon!
Top comments (0)