Anyone who has been following me on social media knows that I am a huge advocate of the public cloud.
By now, we are just after the biggest cloud conferences – Microsoft Ignite 2024 and AWS re:Invent 2024, and just before the end of 2024.
As we are heading to 2025, I thought it would be interesting to share my wishes from the public cloud providers in the coming year.
Resiliency and Availability
The public cloud has existed for more than a decade, and at least according to the CSPs documentation, it is designed to survive major or global outages impacting customers all over the world.
And yet, in 2024 each of the CSPs had suffered from major outages. To name a few:
- Summary of the Amazon Kinesis Data Streams Service Event in Northern Virginia (US-EAST-1) Region
- Azure Incident Retrospective: Storage issues in Central US
- Incident affecting Cloud Firestore, Google App Engine, Google Cloud Functions
In most cases, the root cause of outages originates from unverified code/configuration changes, or lack of resources due to spike or unexpected use of specific resources.
The result always impacts customers in a specific region, or worse in multiple regions.
Although CSPs implement different regions and AZs to limit the blast radius and decrease the chance of major customer impact, in many cases we realize that critical services have their control plane (the central management system that provides orchestration, configuration, and monitoring capabilities) deployed in a central region (usually in East US data centers), and the blast radius impact customers all over the world.
My wish for 2025 from CSPs – improve the level of testing, and observability, for any code or configuration change (whether done by engineers, or by automated systems).
For the long term, CSPs should find a way to design the service control plane to be synced and spread across multiple regions (at least one copy in each continent), to limit the blast radius of global outages.
Secure by Default
Reading the announcements of new services, and the service official documentation, we can learn the CSPs understand the importance of "secure by default", i.e., enabling a service or capability, where security configuration was designed from day 1.
And yet, in 2024 each of the CSPs had suffered from security incidents resulting from a misconfiguration. To name a few:
- AWS Security Bulletin AWS-2024-003
- Microsoft Power Pages: Data Exposure Reviewed
- Exploring Google Cloud Default Service Accounts: Deep Dive and Real-World Adoption Trends
It is always best practice to read the vendor's documentation, and understand the default settings or behavior of every service or capability we are enabling, however, following the shared responsibility model, as customers, we expect the CSPs to design everything secured by default.
I understand that some CSPs' product groups have an agenda for releasing new services to the market as quickly as possible, allowing customers to experience and adopt new capabilities, but security must be job zero.
My wish for 2025 from CSPs is to put security higher in your priorities – this is relevant for both the product groups and the development teams of each product.
Invest in threat modeling, from the design phase until each service/capability is deployed to production, and try to anticipate what could go wrong.
Choose secure/close by default (and provide documentation to allow customers to choose if they wish to change the default settings), instead of keeping services exposed, which forces customers to make changes after the fact, after their data was already exposed to unauthorized parties).
Service Retirements
I understand that from time to time a product group, or even the business of a CSP reviews the list of currently available services and decides to retire a service, leaving their customers with no alternative or migration path.
In 2024 we saw several publications of service retirements. To name a few:
- AWS to discontinue Cloud9, CodeCommit, CloudSearch, and several other services
- Azure Media Services retirement guide
- Google Cloud Platform (GCP) has announced the end-of-sale for Cloud Source Repositories
The leader of service retirement/deprecation is GCP, followed by Azure.
In some cases, customers receive (short) notice, asking them to migrate their data and find an alternate solution, but from a customer point of view, it does not look good (to be politically correct), that the services that we have been using for a while are now stopped working and we need to find alternate solutions for production environments.
Although AWS service was far from being ideal while decommissioning services such as Cloud9, and Code Commit, their approach is different from the rest of the cloud providers, with their working backwards development methodology.
My wish for 2025 from CSPs is to put customers first and do market research before head. Check with your customers what capabilities are they looking for, before beginning the development of a new service.
Even if the market changes over time, remember that you have production customers using your services. Prepare alternatives in advance and a documented migration path to those alternatives. Do everything you can to support services for a very long time, and if there is no other alternative, keep supporting your services, even with no new capabilities, but at least your customers will know that in case of production issues, or discovered security vulnerabilities, they will have support and an SLA.
Cost and Economics of Scale
When organizations began migrating their on-prem workloads to the public cloud, the notion was that due to economics of scale, the CSPs would be able to offer their customers cost-effective alternatives for consuming services and infrastructure, compared to the traditional data centers.
Many customers got the equation wrong, trying to compare the cost of hardware (such as VMs and storage) between their data center, and the public cloud alternative, without adding to the equation the cost of maintenance, licensing, manpower, etc., and the result was a higher cost for "lift & shift" migrations in the public cloud. In the long run, after a decade of organizations working with the public cloud, the alternative of re-architecture provides much better and cost-effective results.
Although we have not seen documented publications of CSPs announcing an increase in service costs, there are cases that from a customer's point of view simply do not make sense.
A good example is egress data cost. If all CSPs do not charge customers for ingress data costs, there is no reason to charge for egress data costs. It is the same hardware, so I really cannot understand the logic in high (or any) charges of egress data. Customers should have the option to pull data from their cloud accounts (sometimes to keep data on-prem in hybrid environments, and sometimes to allow migration to other CSPs), without being charged.
The same rule applies to inter-zone traffic charges (see AWS and GCP documentation), or to enabling private traffic inside the CSPs backbone (see AWS, Azure, and GCP documentation).
My wish for 2025 from CSPs is to put customers first. CSPs are already encouraging customers to build highly-available infrastructure spanned across multiple AZs, and encouraging customers to keep the services that support customers' data private (and not exposed to the public Internet). Although the public cloud is a business that wishes to gain revenue, CSPs should think about their customers, and offer them more capabilities, but at lower prices, to make the public cloud the better and cost-effective alternative to the traditional data centers.
Vendor Lock-In
This was a challenge from the initial days of the public cloud. Each CSP offered its alternative and list of services, with different capabilities, and naturally different APIs.
From an architectural point of view, customers should first understand the business demands, before choosing a technology (or specific services from a specific CSP).
Each CSP offers its services, and it does not mean it has to be a negative thing – if in doubt, I highly recommend you to watch the lecture "Do modern cloud applications lock you in?" by Gregor Hohpe, from AWS re:Invent 2023.
In the past, there was the notion that packaging our applications inside containers and perhaps using Kubernetes (in its various managed alternatives), would enable customers to switch between cloud providers or deploy highly-available workloads on top of multiple CSPs. This notion was found to be false since containers do not leave in a vacuum, and customers do not pack their entire application inside a single container/microservice. Cloud-native applications are deployed inside a cloud eco-system, and consume data from other services such as storage, networking, databases, message queuing, etc., so trying to migrate between CSPs will still require a lot of effort connecting to different sets of APIs.
My wish for 2025 from CSPs, and I know it is a lot to ask, but could you invest in standardization of your APIs?
Instead of customers having to add abstraction layers on top of cloud services, forcing them to choose the lower common denominator, why not offer the same APIs, and hopefully the same (or mostly the same) capabilities?
If we look at Kubernetes, and its CSI Storage, as two examples – they allow customers to consume container orchestration and backend storage using similar APIs, and they are both supported by the CNCF, which allows customers an easy alternative to deploy and maintain cloud resources, even on top of different CSPs.
Summary
There are a lot more things I wish Santa Claus could bring me in 2025, but as it relates to the public cloud, I truly wish each of the CSPs product group could read my blog post and begin making the required changes to allow customers better experience in all the areas that I have mentioned in my blog post.
For the readers in the audience, feel free to contact me on social media, and share with me your thoughts about this blog post.
About the author
Eyal Estrin is a cloud and information security architect, an AWS Community Builder, and the author of the books Cloud Security Handbook and Security for Cloud Native Applications, with more than 20 years in the IT industry.
You can connect with him on social media (https://linktr.ee/eyalestrin).
Opinions are his own and not the views of his employer.
Top comments (0)