Introduction
As commercial software products scale, an important consideration that often gets overlooked is how best to structure the codebase as the application(s) grows. Not at the level for how best to structure the code within an application, but in the broader sense of, 'how do we structure our application/s so they work together effectively?'. Failure to appropriately plan for this can be harmful for cross-team collaboration with its impact on agile workflows as well as lowered developer satisfaction due to difficulty when building on the existing codebase. Considerations for these structures are highly dependent on the tech stack, nature of the application, and agile workflows the team has in place.
The what
When considering the repository structures for an organisation there are two primary schools of thought, mono repositories or poly repositories. Monorepos are a structure where all code for the organisation is housed within a single repository, typically assisted by tooling to handle obstacles that may arise. Polyrepos are the opposite, where an organisation will have separate repositories for sections of a system that handle different concerns. Large companies like Google, Facebook, Microsoft, and Uber utilise monorepos to store their codebase (Google reportedly has the largest monorepo in the world, containing an estimated billion lines of code).
Simply put, a monorepo is a single repository containing multiple distinct projects with well-defined relationships and a polyrepo is multiple repositories containing distinct projects with hard boundaries between them (separate repos). With this said, it is important to remember that a monorepo is not a monolith and should not be thought of as such. A good monorepo should be the opposite of a monolith and the same goes for the project within it. A monorepo should be considered as a group of applications, developed and deployed separately with reused code shared locally within the project as packages.
The why
Polyrepos
The current industry standard for organisations building products is the utilisation of polyrepos which was done this way for several reasons. The thinking behind polyrepos is underpinned by the idea of working on multiple projects across teams, with configurations for their build and deployment managed independently of one another. From an agile perspective, this allows for teams to work independently from one another, determining release cycles, tooling, and standards within the project. Although this structure does work, it creates significant inconsistencies between projects in regards to tooling, code standards, and versioning and requires constant communication amongst teams to maintain relationships between projects to repair breaking changes. This structure also leads to cumbersome code sharing, requiring separate repositories to be set up to hold this code, CI process to be configured and versions to be managed by each project utilising the library. The common alternative to this is teams will create their own implementation or code will be copied over, creating high code duplication between projects. Team autonomy is the main factor that underpins the benefits of a polyrepo for modern software teams, allowing teams to have full control over the standards, CI, and release cycles for their project.
Monorepo
Monorepos attempt to solve some of these problems. It’s important to understand the actual repository structure of many modern monorepos in order to grasp the benefits and solutions they provide as an alternative to polyrepos.
As is evident with the above repository structure, there are typically two funnels where codebases fall, applications or projects (independent codebases that function standalone) and packages or shared libraries that get consumed by those codebases. Immediately you can see that managing shared packages inside the same repository as the codebases which consume them will ultimately lead to increased code reuse, better maintainability, and faster, less cumbersome resolution of breaking changes. This is a clear advantage over polyrepos, providing a sleeker, more readable, and ultimately a more maintainable solution.
One big advantage of polyrepos is the ability to manage build pipelines and deployments independently from one another, a big advantage when working in large teams with applications that have a lot of moving parts. However, as the functionality of modern CI tooling, scoping pipelines based on folder level changes allows monorepos to retain the ability to test, build, preview and release code independently of other interconnected projects or shared libraries. This is proven by large-scale software teams such as Google and Uber, which utilise build tools such as Bazel to provide build caching allowing them to achieve 100-1000s of builds and releases daily. (Could you imagine rebuilding and redeploying every project and shared library that these companies maintain 😰)
By considering a monorepo as a group of applications, while housed in the same repository and may interact to form a large software application, which are completely isolated from one another from a development and deployment perspective, and as a result, retain the same separation of concerns, autonomy, and maintainability as a polyrepo.
The how
You've decided to implement a monorepo, awesome! But now you're a bit overwhelmed with all the options for tooling, the conversion process, and the impact it will have on full project rebuilds. It can be intimidating, especially for a JAM stack, as there are a number of quality tools to choose from, many of which achieve very similar goals.
When I migrated the stack for the company I work at, I took the following steps to ensure a smooth transition.
Researched the crap out of monorepos, monorepo tooling, existing monorepos (and their structures, open source in particular), and other companies' experience in migrating to a monorepo. It's important to consider what benefits you're trying to achieve in this migration, as not all companies and organisation/repository structures are going to benefit from a monorepo; this is especially important to keep in mind when looking at tooling and investigating other companies' experiences.
Next I talked to our team; honestly, this is arguably the most important step. If this structure differs significantly from what your organisation has in place, there should be a strong business case as to why the change is necessary. One of the biggest driving forces for us was the previous structure's impact on our team's ability to remain 'agile'; with changes required across multiple repos for some features, it became tedious and time-consuming to release them.
Then I considered what repositories we need to merge with the highest priority; if your business case is development speed, what repositories cause the biggest slowdowns when changes are required between them? Identify any roadblocks between them, do dependencies need to be updated? Are there any conflicts in versions, folder structures, or references? Is there repeated code that needs to be refactored out? While you may not be able to identify all roadblocks, having a few to keep in mind will help in estimating the scale and complexity of this change.
Finally, all you have left to do is merge them. This is the most daunting step, with the potential to span a relatively large window of time dependent on any roadblocks as discussed above. Don't get disheartened, as you get further into the migration, you should immediately start to notice the benefits outlined earlier, and the payoff long term is worthwhile.
I hope this article has helped in your journey into monorepos, if you have any questions, feel free to drop them in the comments, along with any feedback (it's my first article, would love to hear your thoughts). Stay groovy.
Top comments (0)