From "spaghetti code" to "spaghetti architecture" 🍝
Originally written at pooyan.info
Who is the author? Check out my profile on LinkedIn.
"Serverless" with some 100x delicious dependencies in different layers 😋
Note: All the names and events in this story are fictional, and any resemblance to real people or events is purely coincidental. The story is completely made-up but is inspired by a mix of a few common patterns I have either seen myself or heard from others in the past few years. The purpose of this story is to share a common trending scenario and a few mistakes that can cause IT-based startups to fail.
Spoiler alert: This article might refer to some 100x delicious dependencies in different layers. 😋
OK, here the story begins ✨
Seven years ago, during the IT hype, Mark took a few courses to learn web development using HTML, CSS, JavaScript, PHP, and MySQL. Then he was hired by a startup as a web developer. His main task was building the application's frontend using jQuery and later with React. Gradually, he also learned how to build backend APIs using Node.js and Express. He was the company's full-stack developer at this point. 🥳
Back then, they had one extensive "old" application that no one dared to touch and a new one built with a cleaner approach. The old code had files containing thousands of lines of code. A mix of backend logic, database queries, jQuery, HTML, and CSS. A small fix or a single change in one part of the code could break 10 more things! 🍝 He could easily see how difficult it was to maintain and extend the old application, and he always wanted to avoid making the same mistakes as the previous developers had made. It is always "others" that make mistakes, right? 😉 He studied and practiced design patterns to write cleaner code.
A few years later, he was approached by a recruiter on LinkedIn for a CTO position in a retailer startup. He was living in a small country and knew that companies are desperate to find technical staff in that region, but he was not expecting to be offered a CTO position. He was unsure if he was ready for that, but after a few interviews, he was hired for the position. He was surprised and excited at the same time! The company was in its initial stage and did not have enough money to hire more developers, so he had to do everything by himself.
He knew that he had to learn a lot to build the product. Mark had a friend named Alex, a technical manager at another startup. He asked him for advice. Alex told him that in their company, they were using TypeScript for frontend, backend, and even for writing their Infrastructure as Code (IaC). To Mark, this sounded great because then he could do everything by just learning one programming language that was not even very different from the JavaScript that he already knew! This felt like "the answer" to everything he was expected to do in this new position. 🎉 Alex also said that recently in a technical meetup, he had heard about a new thing called "serverless computing" and "serverless architecture" that could apparently simplify things. He also mentioned things like lambda functions and API Gateway that Mark had never used before.
Soon after, Mark learned TypeScript basics and tried AWS CDK (Cloud Development Kit) to build a few simple lambda functions. He was amazed by how easy it was to build and deploy a lambda function on AWS. He then learned to use API Gateway to create REST APIs and connect them together. He also learned how to start an RDS instance (MySQL) and connect it to these lambda functions using CDK. He could build his first simple CRUD application in a very short time!
Time passed, and the company grew. They hired more developers, and Mark taught them how to use these technologies. Also, one of the new seniors shared with the team how to use unit tests, and then they added a check in their pipeline to run all these tests before deploying the code. The company could not afford to hire architects, DevOps engineers, or SysAdmins because they thought a) they were too expensive and b) a small company like that doesn't need such skills.
In the beginning, everything was fine. They could build the MVP in a few years and successfully deliver it to their first 1,000 customers. Developers were happy because they could build things quickly, and the management was happy about the overall price and development speed.
After a while, now that they were using one database shared with all lambda functions, they realized even a small change in one of the services could sometimes break everything else! They were also facing some random connection issues with the database. After consulting with a few people, they decided to make these services less dependent and split the database. By then, they had hundreds of lambda functions and a couple of API Gateways, so having one RDS instance for each service was not an option. Mark spoke with his friend Alex again, and he suggested DynamoDB. Mark didn't know much about the difference between relational and NoSQL databases, or even if there are different types of such databases, but it was quite easy for him to create DynamoDB tables using CDK and use the new query syntax in his code. He mainly liked DynamoDB because he could easily create new tables without having to instantiate a database engine, define the schema, or deal with VPC, Security Groups, etc. Everything seemed very simple. To simplify things a bit more, he even used non-VPC lambda functions. At this point, they were still in the early stages and had no strict compliance requirements. At least, that's what they thought.
Even though it took a while to migrate all the data and change the queries in the code, after the migration, everyone was happy not having to deal with database-level relations of data, VPC, Security Groups, and other networking and security stuff anymore.
Meanwhile, they had hired a data scientist, Anna, to help business have a better overview. Unlike the dev team, she was mainly using Python, and she could not use AWS the way that the team was using it through CDK and TypeScript. She had to ask the dev team to create her databases and other tools that she needed. But still, she had to do everything from her laptop, because the dev team didn't want to see Python as an option and wanted everyone to use the same programming language and promote the "lovely framework" they had already built. (lambda functions, TypeScript, and their smart set of customizations to enable logging, etc.)
When the pile of poo started to smell 💩
The team was small, so they could not afford to separate the codebase for every single lambda function, so they decided to put all the code in one repository and reuse different parts of the code and libraries for convenience. They were using TypeScript, so they could easily import and reuse different parts of the code (modules) in different lambda functions.
In order to reduce the size of lambda functions, they decided to use standard Step Functions (State Machines) to split the logic into smaller parts and move some of the decision makings out of lambda functions. Soon they realized different people on the same team needed a lot of coordination and communication to make sure they are not breaking each other's code because in designing a workflow, the concept of "interface" simply doesn't exist. That means a simple change in the data structure in one layer could easily break the next ones in the chain.
After a while, they realized different parts of their system have silently become dependent on each other. At this point, they had a few lambda layers that almost every lambda function was using and CloudFormation stacks that were dependent on each other. Long-lasting state-aware Step Functions could not be easily modified or removed. Even though Mark wanted to avoid the maintenance hell of a tightly-coupled monolith that he had experienced in his previous jobs, soon, a single change in one lambda function or a lambda layer could break the whole thing!
Mark was not happy with the situation, but he didn't want to admit the mistake and find a solution for it, so he simply quit his job and went to another company. Later, the company, now a scaleup, hired a new CTO named Juliet.
At this time, the company had enough money to hire a manager who had worked for a few large companies and a few more managers to help her lead their IT. None of them had hands-on IT experience, but most had a good reputation in the industry and were quite good at people management.
The new CTO, Juliet, was a nice person but had no hands-on background in software development and was not aware of the practical and architectural challenges in this field. So she was not experienced enough to see the existing problems and was not confident enough to guide the existing team to make the strategic changes that were necessary at this point. The company continued with the same practices and habits, but the solution became increasingly slow and even exponentially complex. For each task, multiple services had to communicate with each other and wait for the response synchronously. The system was very slow, and sometimes requests were timing out. Having an overview of services and their dependencies and data flow was almost impossible. Error handling was very complex, and debugging was a nightmare... To fix errors, sometimes they were manually invoking failed lambda functions again to see if the error would be gone, but soon they realized due to not having thought about the idempotency of functions, they had ended up with some unwanted double payments or double item shipping! Fortunately, the damage was not too big, but it was still a big problem in the eyes of customers and business. At this point, it was even becoming more and more difficult to identify potential fraudulent activities from thousands of common system errors. Also, they could see some wired scenarios where a dynamically-typed language like JavaScript could cause random bugs in financial calculations that were sometimes difficult to identify or reproduce.
This led to frustration in the management and technical staff. The company started to lose its best developers. They had difficulties hiring new senior staff willing to continue building on top of this. Even though developers mostly had to work overtime, unlike the good old days, they could not meet their deadlines. Also, the quality of their product was so bad that they started to lose key customers. The business was not going well, and the owners decided to sell the company for cheap.
To save the business, the new owner had to hire an expensive team of external consultants to redesign and re-architect the solution and even build some parts of it from scratch. Juliet also left the company and soon joined another medium-sized company as a VP of engineering.
Key takeaways
It is good to remember and remind each other that:
✅ Communication over the network is slower, less reliable, and is also considered to be less secure.
✅ Serverless computing does not necessarily mean FaaS or "Lambda functions"! It is a much broader concept.
✅ Thanks to the initial misunderstanding around AWS SAM (Serverless Application Model), these days, "extensively using Lambda functions for building everything" is commonly thought to be the "serverless architecture".
✅ There are other serverless options in AWS, like AWS Fargate, that can help you use your software frameworks of choice (instead of ending up building your own framework) with more standard tooling like containers (ECS, with the least effort, or EKS, with more customization and a wider ecosystem - CNCF)
✅ In a microservice architecture, the size (of each service) matters! You might want to call them micro-service, nano-service, or whatever, but that has a huge impact on data flow, the amount of duplicated data, network traffic, service communication rates, etc.
✅ It is important to remember that you can split complexity to make it easier for different teams to work on them independently (only if you can manage to reduce dependency), but it never dies! It might even grow 10x somewhere else, so then you need to hire people to deal with that.
✅ Remember, debugging a distributed system is usually much more complex than looking at a single stack trace.
✅ Remember, achieving data consistency is a big challenge in distributed systems.
✅ Remember, things look great when you only hear about the benefits and success stories. Let's avoid marketing traps and fancy buzzwords when we are supposed to build important systems for our customers and societies.
✅ Remember, handling "distributed transactions" can be quite challenging. Using a single transactional database, you can simply roll back the whole transaction if something fails or is invalid, right? But what if you have 10+ services writing into their own databases, and then the 11th fails? What pattern should you use? Saga? Two phase commit? Compensating transaction? ... or just have an exponential retry mechanism? etc.
✅ Remember, microservice architecture was initiated to solve the problem for large companies and corporations with hundreds, not if thousands, of developers having had to work on a single giant monolith. It is not necessarily the best option for small startups with a few junior developers.
✅ Remember, microservice architecture is supposed to help you reduce the complexity by reducing the size and dependencies and moving it to another layer for someone else in your team/organization with a different set of skills to deal with, so if you still have a single shared database or a single codebase where all the services are dependent on each other, you are probably only getting the disadvantages of both worlds!
✅ Remember, hundreds of smart developers might have spent years to build secure software frameworks with a lot of tooling out of the box for you, and you are probably not going to build something better than that in a few months.
✅ In order to build a skyscraper or even a tall building, you first need to design the architecture, and then you should consult with construction engineers to verify that, and then begin by building a strong foundation. You can't start the project from the penthouse! The same applies to software.
Let's use the tools for what they are built for. Lambda and any other FaaS solutions are great for many use cases, but they are not "the answer to everything!" and are not the only serverless options.
You can leverage the power of serverless by choosing the right tools based on the use-case. Lambda, ECS (+Fargate), EKS (+Fargate), ... depending on the size of your organization, the size and complexity of your solutions, etc.
If you liked the article and want to keep me motivated to provide more content, you can share this article with your friends and colleagues and follow me here on Medium or LinkedIn.
Copyright & Disclaimer
- All content provided on this article is for informational and educational purposes only. The author makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site.
- All the content is copyrighted, except the assets and content I have referenced to other people's work, and may not be reproduced on other websites, blogs, or social media. You are not allowed to reproduce, summarize to create derivative work, or use any content from this website under your name. This includes creating a similar article or summary based on AI/GenAI. For educational purposes, you may refer to parts of the content, and only refer, but you must provide a link back to the original article on this website. This is allowed only if your content is less than 10% similar to the original article.
- While every care has been taken to ensure the accuracy of the content of this website, I make no representation as to the accuracy, correctness, or fitness for any purpose of the site content, nor do I accept any liability for loss or damage (including consequential loss or damage), however, caused, which may be incurred by any person or organization from reliance on or use of information on this site.
- The contents of this article should not be construed as legal advice.
- Opinions are my own and not the views of my employer.
- English is not my mother-tongue language, so even though I try my best to express myself correctly, there might be a chance of miscommunication.
- Links or references to other websites, including the use of information from 3rd-parties, are provided for the benefit of people who use this website. I am not responsible for the accuracy of the content on the websites that I have put a link to and I do not endorse any of those organizations or their contents.
- If you have any queries or if you believe any information on this article is inaccurate, or if you think any of the assets used in this article are in violation of copyright, please contact me and let me know.
Top comments (0)