DEV Community

Idris Gadi
Idris Gadi

Posted on

Building Efficient Node.js Workflows in GitHub Actions: Leveraging Caching and Modular Job Structures

As Software Developers, we love automation, be it automating tasks or using that automation. One such automation is CI/CD pipelines, and one of the most widely used platforms for CI/CD is GitHub Actions.

It is crucial to learn the platform that we are using and the features that it offers to build efficient and reliable automation.

One of the most common automation workflows used in Git-based Software Development is PR/MR checks, these checks have become an essential part of modern CI pipelines as they help maintain the source code and improve PR/MR reviews.

In this article, we will focus on creating a PR checks workflow for a Node.js application using GitHub Actions, and how we can improve the DX of the workflow and make it more efficient using modular jobs, and caching.

Prerequisite

Task

We want to create a GitHub Actions workflow that runs whenever a PR is opened or updated, we want to check for three things in this workflow:

  1. Check if the linting rules are followed.
  2. Check the formatting of the code.
  3. Check all the test cases.

This helps maintain the quality of the source code and PR reviews, as the reviewer doesn't have to worry about whether this PR will break any existing(tested) functionality or cause lint and formatting issues.

Creating a PR-Check Workflow

Let's create a workflow file called pr-check.yaml that triggers a pipeline whenever a PR is opened or updated against the main branch.

Note: For this article we will use the ubuntu-latest runner but you can use whichever runner you want.

Define GitHub Actions Workflow

Now, let's checkout the repository, setup node, and install dependencies:
Setup project and install dependencies in a workflow

After the project has been setup and all the dependencies installed, let's run our PR-check scripts to check for lint, formatting, and tests:

First Workflow
Voila!, we have a nice workflow automation that will benefit our development process.

Or is it?

The Issues

You see this automation looks fine at first, however, it has some issues:

  • All our tasks are crammed into one job.
  • On the GitHub Actions dashboard, the workflow diagram will look something like this: PR-Check GitHub Actions Diagram
  • If one of our tasks fails, we have to look into the logs to determine which one failed. This might not look like an issue, but logs expire after some time, and if we have more tasks or long-running tasks, it can be hard to read logs. (I mean c'mon, don't pretend you love reading logs)
  • As a reviewer whenever there is a failing pipeline, you have to look into the logs and then tell the PR owner that 'this' check is failing in their PR. (Yeah, I don't like that either)

A Better Approach

One of our main issues is that all tasks are crammed into a single job, with no clear separation between them. So, let’s start by defining a separate job for each task. This will help keep things organized and make it easier to track each task individually!

Modular workflow with each task in a separate job

This solves all our issues! Each task runs in a separate job, so we don’t have to dig through logs to figure out where things went wrong. Plus, we get a nice diagram on the dashboard showing exactly which task failed. From there, we can dive right into the logs of the specific task to investigate further.

PR-Check GitHub Actions Diagram, with each task in separate job

However, it has a huge issue.

The Issue

Since each job runs in an isolated environment, it needs to make separate network calls for repeated actions, such as actions/checkout, actions/setup-node, and Installing Dependencies.

Caching

The issue of extra network calls can solved by caching the actions and their outputs.

actions/checkout

actions/checkout is globally cached by GitHub Actions and it doesn't require any network call to set it up, it also automatically caches the checked-out repository for the whole workflow run and it will only need to make a network call to check out the repository during the first job run and after that, it uses the workflow cache to check-out the repository in subsequent jobs.

actions/setup-node

actions/setup-node is also globally cached by GitHub Actions and doesn't require any network call to set it up. The action will first check the local cache to install the specified Node.js version and if it can't find that Node.js version in the cache, it will fall back to downloading the specified version and cache it for the workflow run.

GitHub Actions hosted runners provide locally cached Node.js versions based on the Runner Image, you can also access this cache in your self-hosted runner if it has access to github.com or you can set up tool cache on self-hosted runners to cache the required Node.js versions locally.

Installing Dependencies

To install dependencies, we need to download packages from the NPM registry. Since GitHub can't cache every package available on NPM, each job run has to download these packages directly from NPM. This results in a network call for each job, which not only increases the execution time for each job (and the overall workflow runtime) but also eats into the runner quota—especially for private repositories.

Yikes! That's a bummer, isn't it?

We can use another amazing action provided by GitHub, actions/cache to cache our dependencies from our first job run and use that cache for subsequent jobs.

However, we have something even better. actions/setup-node has built-in support for caching global packages data and restoring dependencies from the cache (if available) using actions/cache under the hood. All we have to do is use the optional cache option and pass in the name of the package manager (supported package managers are npm, yarn, and pnpm (v6.10+)).

How to setup cache in actions/setup-node

It gets even better, actions/setup-node defaults to searching for the dependency file (package-lock.json, npm-shrinkwrap.json, or yarn.lock) in the repository root, and uses its hash as a part of the cache key, which means any subsequent workflow runs will use the same global packages cache if the lock file is unchanged (i.e. no changes in the dependencies) or till the cache is alive.

The Better Workflow

Now, let’s put our caching knowledge into the workflow by creating a separate job for each task(check).

Final workflow with modular jobs for each task and caching

Conclusion

Using the actions, modularity, and caching features in GitHub Actions, we developed an effective and efficient workflow to automate our pr-checks. Similarly, we can utilize these features in other CI/CD operations and contribute towards improving project workflow.

If you enjoyed this article and want to connect, feel free to reach out on Linked and X/Twitter.

Top comments (0)