With the announcement of Anthropic’s Claude 3.7 Sonnet model, we, as developers and cybersecurity practitioners, find ourselves wondering – is the new model any better at generating secure code? We commission the model to generate a classic CRUD application with the following prompt:
The model generates several files of code in one artifact, which the user can manually copy and organize according to the file tree suggested by Claude alongside the main artifact.
// DIRECTORY STRUCTURE
// secure-notes-app/
// ├── config/
// │ └── db.js
// ├── controllers/
// │ ├── authController.js
// │ └── noteController.js
// ├── middleware/
// │ ├── auth.js
// │ ├── errorHandler.js
// │ ├── rateLimiter.js
// │ └── validator.js
// ├── models/
// │ ├── Note.js
// │ └── User.js
// ├── routes/
// │ ├── authRoutes.js
// │ └── noteRoutes.js
// ├── .env.example
// ├── .gitignore
// ├── app.js
// ├── package.json
// └── server.js
We can examine the model's outputs as anecdotal evidence of its capacity for generating secure code when asked. Naturally, methods like prompt engineering, fine-tuning, or providing examples of the model might enhance its output, but many users will prefer the less cumbersome approach of just asking the model for what they need in one shot. The experiment was conducted with extended thinking enabled, which can allow the model to reflect as it plans out the code.
Scanning the generated code with Snyk’s IDE extension, there are actually no vulnerabilities detected! There are no known vulnerabilities in Snyk’s vulnerability database for open source dependencies, and there are no obvious vulnerabilities caught by Snyk’s SAST.
However, when examined by cybersecurity professionals, we can see there are still a few issues. In particular, the spotted vulnerability concerns the email validation Claude 3.7 Sonnet generated on the user model for MongoDB.
/^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/,
This is an instance of greedy quantifiers. In a regex engine that uses backtracking – as is the case in JavaScript, Python, and other languages – an expression like this can take non-linear time to resolve. Polynomial or even exponential algorithmic complexity like this can be exploited.
In this example, we can see by timing the command that execution took 3 seconds. But if we keep adding more “a” characters, it will take catastrophically longer to compute. Because regex is CPU-bound on node.js, which is single-threaded, what we have here is a Denial of Service issue.
Still, the model performs well compared to earlier versions of Anthropic’s Sonnet series. We have conducted this experiment in the past and have seen an array of vulnerabilities in the generated code of other models. Sonnet 3.7 outperforms several of its major competitors in this case. In its first release, the model GitHub Copilot generated front-end code with an XSS vulnerability. This was also the case for code generated by ChatGPT 4o when given the same prompt. For example, here is an XSS vulnerability generated within Cursor when following the same prompt:
All in all, when it comes to AI-generated code, it is better to be safe than sorry. Snyk offers several tools to help companies shift left – that is to say, help developers put their best foot forward when they write code the first time, not when it is too late to fix. Independent, open-source projects can apply for free access to enterprise-level tools with Snyk here.
Top comments (0)