I was listening to Episode 575 of Syntax podcast last week
when I heard Wes talk about an idea he had for the show. It was about writing a script that will fetch 8000 repos and analyze
the most common directories and then explain the purpose of each one. I thought it would be interesting to try this with NPM packages and share the result, so here it is.
Collecting the data
🏗️ Building a list of the most popular NPM packages
The first step is to gather the data from a significant number of NPM packages, this seemed easy at first because I was planning to use the NPM Registry API
to get this information, but it doesn't seem to allow searching by popularity alone. So I'm going to have to use a manually curated list of popular packages for this purpose 🤷♂️.
📁 Fetching the folder structure of each package
Now that we have the list of packages, we will need the repository URL of each one to get their directory structure. To accomplish this, I'm going to hit the NPM Registry API to retrieve the data about each package which should contain the repository URL, then I will request the GitHub REST API to get the contents of the root folder:
import packages from "./packages.json" assert { type: "json" };
const NPM_REGISTRY_API_URL = "https://registry.npmjs.com";
const GITHUB_API_URL = "https://api.github.com";
const pkgs = await Promise.all(
packages.map(name =>
fetch(`${NPM_REGISTRY_API_URL}/${name}`)
.then(res => res.json())
)
);
const repos = await Promise.all(
pkgs.map(pkg => {
const githubUrl = new URL(pkg.repository.url);
const repoNamespace = githubUrl.pathname.replace(/\.git$/i, "");
return fetch(`${GITHUB_API_URL}/repos${repoNamespace}/contents`)
.then(res => res.json());
})
);
📊 Counting the directories
We now have a two-dimensional array representing the contents of every package, the last step is to filter out individual files and aggregate the result to count the occurence of every directory:
const directories = new Map();
repos.flat()
.filter(file => file.type === 'dir')
.forEach(dir => {
let count = directories.get(dir.name);
directories.set(dir.name, count ? ++count : 1);
});
I've made the full code available in a gist.
It's time to take a look at the result!
Analyzing the data
The above chart is enough to give you an idea about the most common directories, but we're interested in more than just numbers,
so let's dive deep and explain what's in each folder:
.github
: With more than 60 occurrence, it's not surprising to see this folder on the top spot given that all of the analyzed packages are
hosted on GitHub..github
is a special folder that can contain a variety of config files and templates related to GitHub Actions workflows,
as well as other organizational files for a GitHub repository such asCONTRIBUTING.md
,CODE_OF_CONDUCT.md
,SECURITY.md
and more. Example fromaxios
.tests
/test
/__tests__
: Writing tests is an important practice in software development, and in NPM packages you'll often see them
stored in one of these folders. It can also be used to hold testing helpers and utility functions. Example fromexpress
.docs
: Documentation is an essential part of any package, as it provides users with the information they need to understand how to use it
and how it works. The documentation usually includes usage instructions, API documentation, and more. It can also be included directly
in the repository'sREADME.md
file, but it's often split into multiple files and stored in this folder for ease of navigation and maintenance.
Although the documentation files can be in any format, the most common one is Markdown. Example fromnode-fetch
.lib
: Thelib
folder, short for "library", is mostly used to store the actual source code of the package, but it can also be used to store
third-party code, utilities and helpers. Example frompassport
.examples
: Good documentation goes well with good examples, not only does it provide a practical demonstration on how to use the package, but it also
allows developers to quickly get up to speed and start using the package in their own projects. Example fromexpress
.src
: Similar tolib
, thesrc
folder is also used to organize code, allowing for easy access to the main codebase. Example fromyaml
.scripts
: Maintaining a package can be a lot of work, there's lots of repetitive tasks that need to be done often such as building the package for
different targets, preparing a new release, etc. This is where automation scripts can help, and if a package has any, there's a good chance you'll find
them in this folder. Example fromhistory
.packages
: If you see a directory with this name, you're most likely looking at a monolithic repository
(monorepo). Monorepos contain the code for different projects/sub-components within a single repository, this offers several benefits such as simplified
dependency management, and improved code reusability to name a few. Example fromreact
.bin
: Sometimes it may be desired or even crucial for a package to provide a command line interface, take a testing framework
likejest
as an example. NPM allows packages to publish executable binaries
for this purpose, and as a convention they're usually placed in this directory. Example fromnanoid
.benchmarks
: This directory contains benchmark tests that help measure the performance of the package's code, these tests can be are very useful
when experimenting with performance optimizations, and to ensure no slowdowns are introduced between releases. Example fromgraphql
..husky
: Git hooks are custom scripts that run in response to some event (e.g. before a
commit is created), and they can choose to abort that event under certain conditions. One of their main drawbacks though is that they live inside
the.git
folder, which means they cannot be directly versioned like the rest of the project. This folder is used by the popular Husky
package that makes it possible to include Git hooks with your project and it takes care of installing them to their appropriate location so they
can be detected by Git. Example fromuuid
.
Conclusion
I hope this exploration helped you gain insights about the common naming practices and conventions used in the NPM ecosystem of packages, as well as highlight their importance in improving the accessibility and readability of your project. You can use this knowledge if you plan to build your own
package, contribute to an existing one, or simply navigate your way around if you're browsing the source code.
Top comments (0)