Vercel recently announced Fluid Compute, presenting it as “The power of servers, in serverless form”. I won’t be digging into the details of what goes on behind the scenes with this new feature, there’s a lot of more capable and knowledgeable folks out there who have great pieces of content explaining what it is exactly and how it works (Theo Browne’s video is a great place to start).
Servers vs Serverless
Serverless was put in place to deal with some of the challenges and shortcomings of traditional servers. Unless you have a very predictable traffic pattern from your users, figuring out how much compute power your backend needs at any given time is far from an exact science.
For example, let’s say a single server has the capacity to handle 1000 users at most. You’re under this threshold 80% of the time, and you can spin up a second server to handle that heavier 20%. The problem is, you can’t exactly predict when these spikes happen, and spinning up a server is slow. You’re now stuck with a dilemma:
Spin up (and pay for) the extra server just in case so that you’re ready for when that spike comes, or start spinning up the second server when you detect the traffic increase and risk a window of time where some users might run into an unavailable service.
Serverless functions take this dilemma off your hands by spinning up a function for each request that comes in, which is what allows it to scale so effectively. 1 request, 1 function. 1 million requests the next minute, 1 million functions spun up.
This works great in terms of scaling, but also means each function is an isolated, ephemeral server that needs to remain stateless, and in-function concurrency doesn’t exist.
The problem with idle compute time
If user 1 makes a request that spins up Lambda A, a request from user 2 will spin up a Lambda B. This is all fine and dandy, but Lambda A had just made a 3rd party API call, and was awaiting a response when the request from user 2 came in. Lambda B is now in the same state, and we’re stuck paying for compute time across two lambdas when in reality, they’re both just waiting.
At small scales (meaning when that waiting time isn’t huge) the money you spend on idle compute might be negligible. As an industry, we’ve been optimising for this model for a long time, with Serverless compute like Lambda and Vercel’s functions bringing a ton of value in terms of DX, performance and enabling some pretty neat architecture patterns.
As idle time increases however, things start becoming a problem. Serverless functions are billed for duration, meaning you’re paying for both the active and idle parts of your function execution.
Take a use cases where your downstream service is a slow responder, either by nature (LLMs anyone?) or because of performance issues, and you’re suddenly spending a ton of money on compute resources that are sitting around, doing nothing.
In-function concurrency
Vercel came up with a way to enable their serverless functions (aka the Vercel Lambda) to handle concurrent requests by taking advantage of idle time during IO operations.
In-function concurrency is at the core of this update. Again, there are smarter folks out there explaining this in detail but in short, it allows a single function to handle multiple requests by taking advantage of that idle time. You make a request to your function, which in turn calls that 3rd party API and waits for a response. The window of time that used to be spent idle (ie waiting) is now effectively looked at as an available function to which another request can be sent.
The 2 obvious benefits here are simple:
- You’re no longer paying for idle compute time.
- You no longer need to spin up a function for each incoming request, greatly reducing the likelihood of hitting a cold start.
Functions stay alive
Another benefit with Fluid Compute is that functions no longer shut down connections immediately after responding, which enables streaming for purposes like dynamic rendering of components or those “thinking out loud” features you see in ChatGPT, where the responses to your prompt come to you in what seems to be one character at the time.
Serverless Servers but, you know… better
Vercel presents Fluid Compute as the thing that brings together the best from both worlds, with a paradigm that aims to introduce real concurrency, minimise cold starts and maximise CPU efficiency the way a traditional server would while also providing the main advantages of serverless:
- Auto-scaling
- Automatic provisioning and optimisation
- Pay-per-use pricing
- The developer experience so many of us know and love
Tailor made for Gen AI
It’s hard to overlook the fact that this improvement couldn’t have come at a more relevant time. In a time where everybody is jumping on the AI bandwagon and exploring new ways to build on top of the mainstream GenAI models, it isn’t uncommon for these requests to take many seconds or even minutes to respond. Seconds and minutes that you ultimately pay for while your traditional serverless functions remain idle.
AWS Lambda introduced the world to the concept of true serverless computing in 2014 and while many great options have popped up since, I don’t think I’m alone in the camp that still considers Lambda the king of serverless.
Vercel’s Fluid Compute poses some very interesting questions, however.
I’m in the middle of writing an AI based application myself and was just in the process of figuring out what the best hosting approach would be. I’m a serverless-first (almost serverless-only) dev these days and was a bit torn between a few options that all had their trade-offs.
My DX with AWS has always been amazing, especially since CDK came out with v2. But for a project that relies so heavily on GenAI and the slow nature of the APIs that come with it, Vercel just made themselves the obvious choice.
How will AWS answer?
Whether AWS Lambda has yet to figure out in-function concurrency or already has and simply chose to not “pass the savings on”, Fluid Compute provides a competitive edge in terms of cost efficiency, and it’s hard for me to imagine Lambda won’t keep up.
The ball is now in AWS’s court, will they respond? My hope and expectation is that they will.
As someone who relies heavily on Bun during development, I’ll tell you It’s no coincidence that NodeJS started churning out features like Typescript support, built in ability to read environment variables, etc. Bun was putting on pressure.
Huge props to the folks over at Vercel, for being to AWS Lambda what Bun is to Node. The game is on, and I’m excited to see what AWS comes up with in the coming months.
Top comments (0)