If you're an Azure-DevOps-using organisation, pipelines are obviously key to an effective CI/CD strategy. And there's a skill to getting the steps right for your context. Some tasks you still need to do every single time, even though the contents might not change (much) between builds.
We build .NET solutions which often have framework-based front-ends (or at least some form of front-end packaging process). So we've always got NuGet references and node_modules
folders. It's a rare thing that these change between runs of a pipeline, but it can happen, so we need to fetch updates or run npm install
each time.
Within pipelines, these steps can occasionally take quite some time - possibly several minutes depending on the agent count and capability, size of the payloads or the number of packages/references you have. And it really starts to be come a factor if you're running multiple pipelines for multiple projects simultaneously throughout the day.
Alas, "Cache" tasks to the rescue...
Cache tasks allow you to store blobs based on a key. Adding a cache step to your pipeline will check against the key and set a variable indicating whether there has been a hit or not. If there is a cache hit, the blob will be downloaded to the pipeline working directory and the value of the hit variable can be used a condition on running later steps in the pipeline.
Worked Example - node_modules
Let's work through how exactly we use Cache steps for a NodeJS build (which would ordinarily create a node_modules
folder). Since, during the development of our application, we may add new dependencies, we do need to run the npm install
command on each execution of the pipeline.
- Take a pipeline with a traditional "npm install" step.
- Add a new "Cache" task above that step, and fill it in something like this: This initialises a compound cache key containing the static string "npm", the Agent OS name and the unique hash of the "package-lock.json" file and tells Azure to store the contents of the specified "node_modules" folder as the blob attached to the cache key.
- Modify the "Control Options" of the "npm install" task to use the cache hit variable value defined above. This tells Azure to only run this step if the "NodeJsApp" cache hit variable is not equal to 'true' (i.e. a cache miss).
And, um, that's it.
On first run, the pipeline logs will get a cache miss and run the "npm install". At the end of the pipeline, the referenced cache blobs will be automatically uploaded to storage. On subsequent runs of the pipeline (provided the scope criteria match), the matching blob will be downloaded and the "npm install" step will be skipped entirely.
Some Points of Note
- Azure uses generalised and self-managed blob storage for cache entries. You have no control nor ability to configure where exactly your cache goes, nor to easily access it manually.
- Cache blobs are stored in a scope - essentially mapping to a branch. So if you're working on a development branch which triggers the pipeline with each commit, your first run will create the cache blob and all subsequent runs (provided they don't alter the key) will re-use that blob.
- The size of your blob can be a factor in whether or not caching is actually useful. If it takes as long (or potentially longer) to download the cached blob to your build agent as it does to run the command, you've a dilemma. Well, not really - you choose whichever option is faster in a given content.
Cache Effectiveness
The original driver for looking into this was an attempt to reduce build times for pipelines with NuGet or Node steps which took significant chunks of time and were run often. On one project we've applied this to, a pipeline with two distinct "npm install" steps has gone from those steps taking an average of over 7mins per run to approximately 2mins per run.
...can you spot when we might have enabled the caching of these 2 steps?
A thoroughly useful technique for speeding up the execution of pipelines with required but potentially time consuming steps. Just take note to get the cache key correct and don't fall foul of creating a blob larger than the payload required to run the command afresh on each execution.
Top comments (0)