DEV Community

Sol Lee
Sol Lee

Posted on

Optimizing +200 Pipelines of a Monorepo

The Frontend Monorepo

What's a monorepo? Mono + repository, monorepo in short, it is a single repository that manages multiple projects or libraries in one. This makes it easier for code sharing and dependency management, and gives many developers a consistent development experience. Also, it is convenient to take CI/CD settings together because all projects are managed in a single repository.

Toss frontend chapter manages over 200 services in a single monorepo. Contributors to this repo are about 50-60 devs and the average daily merged PR is over 60!

With so many services being managed, monorepo became bigger and bigger, and in the end, it was impossible to simply clone repos. That's why I often use filter=blob:none so that I don't download blob until I need it.

At Toss, it takes only about 5 minutes from git push to deployment even in such a big monorepo, and here's how we achieved it.

Secret 1: Parellel CI/CD

In a monorepo, multiple services are often changed at the same time, and the service is built multiple times instead of once. At Toss frontend chapter, development, staging, and live environment builds are all done separately. Therefore, we need (the number of changed services) x (the number of environments that need to be built) times.

Here's a real example. If there are changes to three services: shopping, transfer, and pedometer, we will build a total of nine (3 x 3). Assuming that each service takes 5 minutes to build on CI/CD, it takes 45 minutes in total to build sequentially.

3 services x 3 environments x 5 minutes build time = 45 minutes

The more services that change like this, the longer it takes to build, and the more time the developer waits or does something else, the more context switching costs. 😢

If you're using yarn, you can run the build in parallel through the yarn workspace signal run --jobs option. But if each service is built in a single computing environment, there's a problem of sharing the same computing resources as the CPU and memory. In an environment with limited CPU and memory, increasing the number of parallels makes the process slower. Eventually, if there are more build targets, it takes similar time as runnnign sequentially.

If you set the pipeline to run all the builds in an independent environment, you can root out these problems! At the frontend chapter, we used CircleCI's Dynamic Configuration.

Ideally, no matter how many services there are, all services will be built in less than 6 minutes.

2 services: 1 minute (trigger pipeline) + 5 minutes (parallel pipeline)
40 services: 1 minute (trigger pipeline) + 5 minutes (parallel pipeline)
200 services: 1 minute (trigger pipeline) + 5 minutes (parallel pipeline)

If you have unlimited amount of budget, you can save time indefinitely, but don't forget that realistically you need to control the maximum number of runners and construct a pipeline to meet the financial requirements.

In the end, we can save about 5 times more time based on the two service changes.

AS-IS: 2 services (6 distributed): 6 * 5 = 30 minutes
TO-BE: 2 services (6 distributed): 1 + 5 = 6 minutes

You can dynamically configure pipelines through Jenkins as well as CircleCI.

With the capabilities of these various CI/CDs, it is most important to perform each pipeline in an independent computing environment.

Secret 2: Daily Docker Base Image

Currently, the size of the monorepo is well over 40GB. It takes a very long time or a timeout error to check out to this repository every time on a runner.

How can we reduce checkout time without making an error in CI?
**
The way to do it is to **pre-replicate the monorepo
. If you start building in a pre-replicated environment, you won't have to download git from the beginning, so you don't have to wait long! The contents of the monorepo are introduced into docker images in advance, and only the changed parts are newly received.

This process can be done by writing Dockerfile as follows.

FROM docker.io/cimg/node:20.14.0
SHELL ["/bin/bash", "-c"]

WORKDIR ${HOME}/project

# `git clone` the 50 commit details first
RUN git clone --depth 50 $CIRCLE_REPOSITORY_URL

# Downloads up to 1000 commit history enough to calculate code changes
RUN git fetch --depth 1000 --force origin main

# Set the current environment to the main branch
RUN git checkout --force -B main

# Yarn install (optional)
RUN yarn
Enter fullscreen mode Exit fullscreen mode

We've booked a Dockerfile like this to run every day at 7am. Tt takes 36 minutes to complete the runner. This will show you that if you had to check out every time you build, it takes 36 more minutes each time!

The docker image created in this way can be used using CircleCI's executor. CircleCI's executor refers to a setting where you can run a job with a specific image as below.

version: 2.1
excutors:
  my-executor:
    docker:
      - image: cimg/ruby:3.0.3-browsers
jobs:
  my-job:
    executor: my-executor
    steps:
      - run: echo "Hello executor!"
Enter fullscreen mode Exit fullscreen mode

At Toss, we manage Docker images using AWS ECR, and to use this as a CircleCI executor, you can write it as follows.

executors:
  toss_frontend_excutor:
    docker:
      - image: xxxxxx.dkr.ecr.ap-northeast-2.amazonaws.com/ci-base-image:latest
        aws_auth:
          aws_access_key_id: $AWS_ACCESS_KEY_ID
          aws_secret_access_key: $AWS_SECRET_ACCESS_KEY
        environment:
          TZ: 'Asia/Seoul'
          AWS_DEFAULT_REGION: ap-northeast-2
          ECR_REGISTRY: xxxxxx.dkr.ecr.ap-northeast-2.amazonaws.com
          ECR_REPOSITORY: ci-base-image
          ECR_LATEST_TAG: 'latest'
Enter fullscreen mode Exit fullscreen mode

Now let's use this executor in the jobs:

jobs:
  trigger-publish:
    executor: toss_frontend_excutor
    steps:
      - trigger-publish-service
Enter fullscreen mode Exit fullscreen mode

If you check the time to download only the changed parts after the pre-received content, it has been shortened to 22 seconds.

In this way, we were able to reduce the time required by 36 minutes to 22 seconds. For every job that needs a git checkout, we could save about 36 minutes!

Secret 3: SSR Standalone Docker Image

Lastly, the Standalone mode, which dramatically reduces SSR deployment time.

The Node File Trace allows you to extract only the dependencies you need for application runtime. Therefore, when you create a build environment by gathering the minimum required JavaScript files, it becomes dramatically lighter, faster to deploy to Docker builds and K8S.

By the way, Next.js offers this option as output: 'standalone' in their next.config.js, too.

However, this functionality was dependent on the node_modules directory. Since we use Yarn PnP at Toss, we couldn't use this functionality as it stored the deps in .yarn/cache instead of node_modules.

So we used the Yarn PnP API to create a function with a role similar to the Next.js standalone function.

In reality, it's more complicated, but here's just pick of an idea:

async function createSSRBundle(options: Options): Promise<SSRBundle> {
  const context = getSSRBundleContext(options);

  const [files, depFiles] = await Promise.all([
      getFilesForSSRBundle(context),
      getDependencyFiles(context),
  ]);

  const fileEntries = createFileEntries([ ...files, depFiles ], context);
  const pnpLoaderEntries = createPnPLoaderEntries(context);
  const pnpEntries = createPnPEntries([ depFiles ], context);

  const zip = new Zip();
  await addEntriesToZip(zip, [
    ...fileEntries,
    ...pnpLoaderEntries,
    ...pnpEntires
  ]);

  return zip.toStream();
}  
Enter fullscreen mode Exit fullscreen mode

The bundle.zip file created through the above function is available in the SSR Dockerfile.

FROM node:20.11.1-alpine3.19

WORKDIR /app
COPY ./bundle.zip ./bundle.zip

RUN unzip -i ./bundle.zip -d workspace

WORKDIR /app/workspace
CMD node -r ./.pnp.cjs --experimental-loader ./pnp.loader.mjs ./server.js
Enter fullscreen mode Exit fullscreen mode

Now, with bundle.zip, you can run the SSR server anywhere, so you can run the server without any problems even without git and yarn.

The size of the SSR Docker Image is also reduced from 4GB to about 200MB (optimized by 20 times).

The reduced size of the SSR Docker Image also leads to higher speed of deployment, which reduces PodInitialization time in Kubernetes. (PodInitialization refers to the process of pulling the docker image required to float the K8S pod)

Top comments (0)