Mohamed AboElKheir

Posted on Jan 22

How Reachability Analysis 🔎 can help with open source vulnerabilities mess (Coana as an example)

#appsec #security #cybersecurity #design

If you are a security engineer or a developer, you probably already know the pain of having to deal with the vulnerabilities affecting the open-source packages (e.g. npm, pip, maven, .. etc) used by your application. In today's story, we discuss "Reachability analysis", a feature that promises to ease this pain by eliminating 70-80% of these alerts using Coana, as an example. But first, let's dig into what is wrong with pen-source vulnerability scanning.

The open-source vulnerabilities mess

Software is built using many building blocks, and for any application to be secure all the building blocks need to be secure. One of the most important building blocks is open-source packages (e.g. npm, pip, maven, .. etc) which is estimated to constitute 70-90% of modern applications according to some studies.

That is why it is important to continuously scan open source packages to check if the used versions are affected by known vulnerabilities, as these vulnerabilities could potentially lead to exploits even if the code of the application doesn't have any security issues.

Most organizations use SCA tools (e.g. Snyk, Dependabot, .. etc) to perform such scans, However, they usually run into multiple issues in practice:

The number of findings is huge and unmanageable, this is mainly because as mentioned earlier open source can be up to 90% of the actual codebase (e.g. a simple hello world application that uses Express can have more than 100 npm packages if you count the child dependencies as shown below, and this grows pretty quickly as the application gets more complex).

$ cat package.json
{
  "name": "test",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "test": "echo \\"Error: no test specified\\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "description": "",
  "dependencies": {
    "escape-html": "^1.0.3",
    "express": "^4.19.2",
    "lodash": "^4.17.20"
  }
}

$ npm list --all
test@1.0.0
├── escape-html@1.0.3
├─┬ express@4.19.2
│ ├─┬ accepts@1.3.8
│ │ ├─┬ mime-types@2.1.35
│ │ │ └── mime-db@1.52.0
│ │ └── negotiator@0.6.3
...

# This simple application has 120 npm packages
$ npm list --all | wc -l
     120

Fixing the vulnerabilities is not as straightforward as it seems for multiple reasons, e.g.:
1. The package with the vulnerability may not have a patched version yet.
2. The vulnerability affects a child package (dependency of a dependency), and the parent package doesn't have a version that uses the patched version of the child package yet.
3. The patched version of the affected package could introduce some breaking changes that need some code refactoring.
4. Even if all these issues are not present, in many cases the application usually doesn't have enough testing coverage, which means manual testing is needed to apply the fix.
These issues, along with the huge number of findings, put the development team in front of a difficult choice: either spend an unreasonable amount of time fixing and testing the findings, slowing down the development process or ignore the findings (or fix them in batches in long periods of time), which is what many teams end up doing.
As a result of this mess, some teams decide to only prioritize findings of high or critical severities, but this is usually not enough to bring down the load to a reasonable level.

Severity is not everything

Now, let's look at things from a different angle. A developer could think that if my application has 100+ critical and high open source vulnerabilities, but hasn't been hacked yet, this probably means that the severity of these vulnerabilities is not really critical or high, and they wouldn't be entirely wrong.

This discrepancy between finding severity and actual impact stems from the fact that open-source vulnerability scanners don't answer the question "Are these vulnerabilities exploitable in the context of my application?". As we are going to see shortly the answer for the vast majority of findings is "no", and this means two things:

We spend a lot of time fixing and testing vulnerabilities classified as Critical or High, but they are actually not exploitable, so they don't have any real impact.
We don't know which of these vulnerabilities are actually exploitable, which means that they may end up being ignored or at least the fix would take too long, which significantly increases the risk for our application.

Is this vulnerability exploitable?

Let's take the below application as an example:

const express = require('express');
const _ = require('lodash');

const app = express();
const port = 3000;

// Sample data
let users = [
    { id: 1, name: 'Alice', age: 25 },
    { id: 2, name: 'Bob', age: 30 },
    { id: 3, name: 'Charlie', age: 35 },
];

// Middleware to parse JSON bodies
app.use(express.json());

// Route to get all users
app.get('/users', (req, res) => {
    res.json(users);
});

// Route to get a user by id
app.get('/users/:id', (req, res) => {
    const userId = parseInt(req.params.id);
    const user = _.find(users, { id: userId });

    if (user) {
        res.json(user);
    } else {
        res.status(404).json({ error: 'User not found' });
    }
});

// Route to add a new user
app.post('/users', (req, res) => {
    const newUser = req.body;

    // Use lodash to assign an incremental id and add to users list
    newUser.id = _.maxBy(users, 'id').id + 1;
    users = _.concat(users, newUser);

    res.status(201).json(newUser);
});

// Route to update a user
app.put('/users/:id', (req, res) => {
    const userId = parseInt(req.params.id);
    const updatedData = req.body;
    const userIndex = _.findIndex(users, { id: userId });

    if (userIndex >= 0) {
        // Use lodash to merge the updated data with the existing user data
        users[userIndex] = _.merge(users[userIndex], updatedData);
        res.json(users[userIndex]);
    } else {
        res.status(404).json({ error: 'User not found' });
    }
});

// Route to delete a user
app.delete('/users/:id', (req, res) => {
    const userId = parseInt(req.params.id);
    const user = _.remove(users, { id: userId });

    if (user.length) {
        res.json({ message: 'User deleted', user });
    } else {
        res.status(404).json({ error: 'User not found' });
    }
});

// Start the server
app.listen(port, () => {
    console.log(`Server running at <http://localhost>:${port}`);
});

This application has two main dependencies express and lodash (a very popular npm package with a variety of useful functions). In this case, we are using lodash to query and update the mock user database.

Let's assume this uses the below versions of these 2 packages as shown in the below package.json file:

{
  "name": "test",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "test": "echo \\"Error: no test specified\\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "description": "",
  "dependencies": {
    "express": "^4.19.2",
    "lodash": "^4.17.20"
  }
}

Let's run the Snyk scanner on this application to check the findings

$ snyk test

Tested 66 dependencies for known issues, found 3 issues, 3 vulnerable paths.

Issues to fix by upgrading:

  Upgrade express@4.21.1 to express@4.21.2 to fix
  ✗ Regular Expression Denial of Service (ReDoS) [Medium Severity][<https://security.snyk.io/vuln/SNYK-JS-PATHTOREGEXP-8482416>] in path-to-regexp@0.1.10
    introduced by express@4.21.1 > path-to-regexp@0.1.10

  Upgrade lodash@4.17.20 to lodash@4.17.21 to fix
  ✗ Regular Expression Denial of Service (ReDoS) [Medium Severity][<https://security.snyk.io/vuln/SNYK-JS-LODASH-1018905>] in lodash@4.17.20
    introduced by lodash@4.17.20
  ✗ Code Injection [High Severity][<https://security.snyk.io/vuln/SNYK-JS-LODASH-1040724>] in lodash@4.17.20
    introduced by lodash@4.17.20

Let's focus on the High severity finding CVE-2021-23337 affecting the lodash package (Code Injection [High Severity][<https://security.snyk.io/vuln/SNYK-JS-LODASH-1040724>] in lodash@4.17.20). If this is really a “High” severity vulnerability that causes Code Injection, this means that developers should leave everything and fix this as soon as possible. However, as security engineers, it is part of our job before asking the developers to leave everything to be sure the issue actually needs such urgent action.

This takes us to the question we need to answer "Is this vulnerability exploitable in the context of my application?". To be able to answer this question, let's have a closer look at the vulnerability. If we open the link mentioned in the Snyk scan https://security.snyk.io/vuln/SNYK-JS-LODASH-1040724 we will find the PoC (Proof of concept) code showing the payload to exploit the vulnerability:

var _ = require('lodash');

_.template('', { variable: '){console.log(process.env)}; with(obj' })()

For this PoC and from the overview, it is clear that this payload works when passed to the template() function and specifically to the templateOptions.variable argument. Having a quick look at our code, we can easily see that we are not using the template() function anywhere (we are only using the find, maxBy, concat, findIndex, merge, and remove lodash functions ). This means that this vulnerability although initially classified as "High" is not exploitable in our case, and hence can be safely ignored/de-prioritized, and definitely we shouldn't be asking developers to leave everything to fix this.

Reachability analysis

The above was an example of how we can manually review and triage a finding to determine whether it is exploitable. However, this doesn't scale well as the number of findings and the application complexity increase, it is not feasible to perform the same analysis for hundreds of findings. This is where automation could come to the rescue!

In the above example the answer to the question "Is the finding exploitable?" depended on another question "Is the vulnerable function used?". This question is easier to answer as we will see shortly, and this is basically what "Reachability" is about. If the vulnerable function is used then the vulnerability is considered "Reachable", otherwise it is not.

For example, in the below application packages 2 and 3 have vulnerabilities. However, as the application only uses the vulnerable function in package 2, only package 2's vulnerability is reachable, and package 3's vulnerability is unreachable, and can be safely ignored/de-prioritized.

Coana as an example

Let's take Coana as an example, an SCA that performs Reachability analysis, and at the time of writing this article is free to use on open-source projects. Coana creates a code property graph of your application and uses this graph to determine which functions in your dependencies (and child dependencies) are actually being called, hence automating the analysis we performed earlier.

Let's run a Coana scan on the sample application we analyzed earlier by following the steps in their documentation. As shown below, it reached the same conclusion about the lodash Code injection vulnerability that it is not reachable, and in the analysis details you can see the vulnerable functions it was looking for to determine reachability.

Let's try it for an exploitable vulnerability

Let's try that again but with the below application where the same lodash vulnerability is actually exploitable.

const express = require('express');
const _ = require('lodash');
const bodyParser = require('body-parser');

const app = express();
app.use(bodyParser.urlencoded({ extended: true }));

app.post('/generate-story', (req, res) => {
    const { name, meal, place, car, options } = req.body;

    const templateString = `
        <h2>Here is your random story:</h2>
        <p><%= name %> went to <%= place %> in their <%= car %> for a nice <%= meal %>.</p>
    `;

    let templateOptions = {};
    try {
        // Parse the hidden options field (this is the vulnerable part)
        if (options) {
            templateOptions = JSON.parse(options);
        }
    } catch (error) {
        return res.status(400).send('Invalid options JSON');
    }

    try {
        // compile template without user-supplied options
        const compiled = _.template(templateString, templateOptions);
        const story = compiled({ name: _.escape(name), meal: _.escape(meal), place: _.escape(place), car: _.escape(car) });

        res.send(story);
    } catch (error) {
        res.status(500).send('Error generating story');
    }
});

app.listen(3000, () => {
    console.log('Server running on <http://localhost:3000>');
});

This code uses the template() function and also takes the templateOptions from the request body of the /generate-stroy route. Hence, can exploited with the payload in the poc, e.g. we can inject code to expose all environmental variables as shown below:

$ curl -X POST <http://localhost:3000/generate-story>
      -H "Content-Type: application/x-www-form-urlencoded"
      --data-urlencode "name=Alice"
      --data-urlencode "meal=Pizza"
      --data-urlencode "place=New York"
      --data-urlencode "car=Tesla"
      --data-urlencode "options={\\"variable\\":\\"){return JSON.stringify(process.env)}; with(obj\\"}"

{
  "CLICOLOR": "1",
  "COLORFGBG": "7;0",
  "COLORTERM": "truecolor",
  "COMMAND_MODE": "unix2003",
  "EDITOR": "vim",
  "HISTFILESIZE": "2000000",
  "HISTSIZE": "1000000",
  "HISTTIMEFORMAT": "%F %T ",
  ....
}

Now let's use Coana to scan this vulnerable example, and as expected, now the same vulnerability is shown as "Reachable". Moreover, Coana will show us the lines of code where the vulnerable function is being used, this would help us plan and test the fix.

Reachability analysis in practice

In practice, Coana's reachability analysis is usually able to discard 70-80% of the vulnerabilities as unreachable, this significantly reduces the load on security and development teams and also helps the same teams focus on the vulnerabilities that are more likely to have actual impact on the application. This removes a lot of the mess we explained earlier in this story.

Reachable != Exploitable

One thing to note is that if a finding is reachable, it doesn't necessarily mean it is exploitable, as sometimes there other conditions that need to be met for the exploitation besides the vulnerable function being called.

For example, in the vulnerable code we used above, besides using the template() function, the templateOptions argument needed to be controlled by user input and passed to the function. Hence, If we removed the options parameter from the request body in the example, and didn't pass it to template() the example no longer becomes exploitable. In this case, Coana will still mark the finding as "Reachable", and manual triage is needed to complete the analysis and decide that it is not exploitable.

The below code is an updated version where the vulnerability is "Reachable" but not "Exploitable"

app.post('/generate-story', (req, res) => {
    const { name, meal, place, car } = req.body;

    const templateString = `
        <h2>Here is your random story:</h2>
        <p><%= name %> went to <%= place %> in their <%= car %> for a nice <%= meal %>.</p>
    `;

    try {
        // compile template without user-supplied options
        const compiled = _.template(templateString);
        const story = compiled({ name: _.escape(name), meal: _.escape(meal), place: _.escape(place), car: _.escape(car) });

        res.send(story);
    } catch (error) {
        res.status(500).send('Error generating story');
    }
});

That being said, Reachability analysis still adds a lot of value by excluding the unreachable findings. The point here, is that some manual triage could help discard even more vulnerabilities that won't have impact.

Conclusion

When using security tools such as SCA scanners, it is important to remember the initial goal we are using the tool for, which is eliminating risk and preventing negative impact on your application. Hence, it doesn't matter the number of findings we are getting from these tools, if we don't have enough confidence in the quality of these findings, and how much they actually represent risk and impact. That is why features like reachability analysis are useful, as they are able to eliminate a lot of the noise and give us much more confidence that the findings we are focusing on are the ones that represent probable risk and impact.

DEV Community

How Reachability Analysis 🔎 can help with open source vulnerabilities mess (Coana as an example)

The open-source vulnerabilities mess

Severity is not everything

Is this vulnerability exploitable?

Reachability analysis

Coana as an example

Let's try it for an exploitable vulnerability

Reachability analysis in practice

Reachable != Exploitable

Conclusion

Top comments (0)

Read next

Best headless CMS for Next.js - Typescript support comparison

Prime Number Generation Algorithm in Ruby

Introducing New Features in Crudify: Streamline Your API Development

Create Stunning Images from Text with a Telegram Bot 🚀