Note, while this post refers to Vue SPAs, the concept is the same for React apps
divporter / ssr-lambda-edge
Serverless Side Rendering with Lambda@Edge
Single Page Apps (SPAs) are great. I'm a big fan. You can offload all that HTML generation to the client, and SPAs make up the 'J' and 'M' in JAM stack. An important distinction in sites built with JAM stack is that the SPA is served by a CDN and not a traditional web server. The client and server should be completely uncoupled.
In AWS world you simply upload your SPA to S3 and serve it with CloudFront. But what do we do about SEO? Well, when GoogleBot crawls the page it will run any synchronous JavaScript (within a time limit) and then crawl the resulting page. Note the synchronous there which means GoogleBot won't see any data that is fetched when a regular client loads the page.
Enter Server Side Rendering (SSR). For the unfamiliar, here's a quick summary. When a user makes a page request instead of serving an empty index.html
and main.js
the server looks at the route, fetches any required data and then renders the HTML from your SPA according to the SPA routing (eg Vue Router) and then serves up nicely rendered HTML. So now when GoogleBot sees your page all of your dynamic content is there.
Oh but wait... we don't have a server. So we turn to Lambda. Before that, let's look at our options.
SSR everything
One option is to do SSR for all page requests that CloudFront receives. A problem there is that SSR isn't fast and when there's data fetching involved it's only as fast as the API it's pulling from. So instead of loading your index.html
page quickly and showing your users a nice loading screen. They just see a blank page for a few seconds. We can easily implement caching so that the first unlucky user has to wait a few seconds and then every subsequent user gets it lightning fast from the CDN.
SSR for SEO only
This is the option I will focus on. So a "regular" user gets your index.html
with the standard SPA client side rendering. GoogleBot on the other hand is treated to a server(less) side rendered html page with all of our dynamic content. Likewise we can implement caching so we don't have to waste Lambda resources rendering the same page over and over.
Architecture Decisions
There are a couple of ways to do SSR for SEO only. Using run of the mill Lambda or using Lambda@Edge.
Lambda
In this model, a Lambda is configured as a CloudFront origin and handles any path that is not an api route, static route, or has an extension other than .html
The Lambda determines whether the user is a web crawler using es6-crawler-detect for example. If it is a bot, then proceed with SSR. If it's not a bot then we'll need to serve up index.html
This is pretty straightforward, but to handle requests for things such as favicon.ico
or manifest.json
which typically live at the root level we need to either make sure we configure the cache behaviors to serve them from S3, or serve them from our Lambda (which is a little trickier).
Lambda@Edge
Here we leverage the power of Lambda@Edge. Lambda@Edge is a special type of Lambda in that unlike "regular" Lambda functions which run at the data centre of your specified region Lambda@Edge runs at the CloudFront edge location where the request is made. In principle it should be faster because it's closer to your user.
In this scenario we are going to tell CloudFront whether or not look in the S3 bucket in response to the request, based on the request path and the User-Agent header. So firstly if the path is pointing to a file (eg manifest.json
) then we tell CloudFront to get it from our S3 origin. If it's a request to a page (eg example.com/page) then we need to see if it's a bot or not. If it is a bot then we perform SSR and return rendered HTML. If it's not a bot, then serve up index.html
from our S3 origin. In comparison to the Lambda model, this lambda doesn't serve up things like manifest.json
, it only does SSR.
Lambda@Edge implementation
OK I hear you. Enough is enough, I've set the scene. Show me some code I can use. Let's start with the Lambda@Edge handler.
WARNING: the response object is very very delicate. For another example refer to the AWS docs
So what is happening? Let's say a request has been made to https://example.com/page and CloudFront has been configured to look in our S3 bucket to fulfil this request. Now let's consider two User-Agent scenarios
Scenario 1. User-Agent is GoogleBot
Looking at the if statement
if ((!path.extname(request.uri) && !request.uri.startsWith('/api')) || (request.uri === '/index.html'))
This will evaluate to (true && true) || false
which is true
.
Then the next one is obviously true
if (CrawlerDetector.isCrawler(userAgent))
So we're going to be doing some SSR.
if (request.uri === '/index.html')
This line exists so that Vue router in our SPA treats index.html
as the '/' route. Although not true in this case, it's worth pointing out.
Alright now to do some SSR.
const ssrResponse = await new Promise((resolve, reject) => {
const renderer = createBundleRenderer(serverBundle, {
runInNewContext: false, // recommended
template,
clientManifest
})
renderer.renderToString({}, (err, html) => {
if (err) throw err
let minified = minify(html, {
caseSensitive: true,
collapseWhitespace: true,
preserveLineBreaks: true,
removeAttributeQuotes: true,
removeComments: true
})
const response = {
status: '200',
statusDescription: 'OK',
headers: {
'content-type': [{
key: 'Content-Type',
value: 'text/html; charset=utf-8'
}],
'content-encoding': [{
key: 'Content-Encoding',
value: 'gzip'
}]
},
body: zlib.gzipSync(minified).toString('base64'),
bodyEncoding: 'base64'
}
resolve(response)
}, reject)
})
The first part is standard SSR according to the Vue.js SSR Guide. For more information check it out, it's pretty cool. Skipping over that, let's get down to the response object, it has to be exactly right or CloudFront will error out. It's important to compress the HTML returned in the response body because we've got to limit the generated response to 1 MB. Check out the CloudFront Quotas for more information. If your compressed response is over 1 MB then we can handle this another way which I'll cover later.
Getting back to it, now that the SSR has rendered the HTML and we've generated the response object, now we simply return it.
CloudFront will then cache the response against the url https://example.com/page + User-Agent. So next time GoogleBot comes along it will serve the SSR rendered HTML straight from the cache. Noice!
Scenario 2. User-Agent is Mozilla/5.0 etc etc
Now a real user is coming to look at https://example.com/page. Although the request url is the same, the User-Agent is different so CloudFront won't serve from the cache. It will make a request to the origin where our Lambda@Edge will intercept it. Looking at the logic.
if ((!path.extname(request.uri) && !request.uri.startsWith('/api')) || (request.uri === '/index.html'))
This is true again.
if (CrawlerDetector.isCrawler(userAgent))
This is false however as we are not crawlers. So nothing left to do but proceed with the request untouched. This means it will continue with its original intentions and look in S3 for the page. As this is an SPA there is no /page folder so it will send back a 404. Typically when hosting SPAs on CloudFront you convert 404s to 200s and serve up index.html
and so for this request the user gets the standard index.html
and the HTML rendering and data fetching occurs on the client side as we intended.
Scenario 3. Request is for manifest.json
As this file has an extension it fails the first hurdle and we continue with the request and the file is retrieved from S3 happily.
Serverless Implementation
That's great, but how do I set up all this up in CloudFront? This section makes an assumption that you have the following good to go:
- An S3 bucket with your static website files
- An API (optional)
Oof! Alright, I'll point out some the key lines in the serverless.yml
. First up, in the function definition we have a lambdaAtEdge
key. While serverless.com now supports Lambda@Edge as a function event, the @silvermine/serverless-plugin-cloudfront-lambda-edge
plugin has been around much longer and as such I've been using it long before Serverless rolled out native support for Lambda@Edge functions. And to be honest despite my efforts I couldn't get the CloudFront event to work with multiple origins. So vive le Silvermine plugin. Anyhoo, this plugin connects the Lambda@Edge function to our CloudFront distribution.
Which is a great segue to... our CloudFront distribution which we define in the resources
section. Skipping ahead to CacheBehaviours
which is a list of paths and instructions for how CloudFront should handle them. Note these are applied in the order that they're defined. First up is the /api
path. This allows our API to be called under the same CloudFront domain as our front end. If you don't have an API or you don't need/want it living under the same domain then you can delete this block. Last up is the *
path which points to our S3 bucket. Note this section:
ForwardedValues:
Headers:
- 'User-Agent'
This tells CloudFront to forward the User-Agent and use it as part of the cache key. If we miss this, then we can't determine if we're dealing with users or bots.
Then in the Origins
section is where we give CloudFront the details of our API (delete if not required) and our S3 bucket (required).
Finally the last thing of note is the custom error response.
CustomErrorResponses:
- ErrorCode: 403
ResponseCode: 200
ResponsePagePath: /index.html
ErrorCachingMinTTL: 5
This is standard SPA configuration stuff so that when we request paths like https://example.com/page which aren't actual files (because we've built an SPA), it will serve up index.html
and Vue Router will handle the internal routing.
So that's it, easy-peasy! OK, it's actually very fiddly and delicate, with lots of moving parts, but when you get it working it's magical.
Now to tidy up some loose ends.
Can I SSR everything with Lambda@Edge?
In this article I focused on only doing SSR if the User-Agent is a web crawler. However, if you want to use Lambda@Edge for all page requests then simply remove the es6-crawler-detect
parts and now all requests will be handled by Lambda@Edge. It would be a good idea to reduce the MaxTTL
and DefaultTTL
in the CacheBehaviours
for the '*' PathPattern so the data on the dynamic pages is not potentially 2 days old. This is no big deal for crawlers, but for users it's a good idea to give nice fresh data.
My SSR rendered HTML is over 1 MB even after compression
No problemo. First you simply need to create a Lambda with API Gateway proxy and put the SSR code in it. Next add it as an origin in your CloudFront distribution with a path like /ssr
. Note that your newly created lambda needs to have a matching stage so that it responds to request at /ssr
(eg abcde12345.execute-api.ap-southeast-2.amazonaws.com/api). Then in your Lambda@Edge function, when you want to do SSR, instead of generating the HTML in the @edge function you change the origin to the lambda you just created. Instead or generating a response you modify the request like so.
const ssrDomainName = 'abcde12345.execute-api.ap-southeast-2.amazonaws.com'
if (request.uri === '/index.html'){
request.uri = '/'
}
request.origin = {
custom: {
customHeaders: {},
domainName: ssrDomainName,
keepaliveTimeout: 5,
path: '/ssr',
port: 443,
protocol: 'https',
readTimeout: 30,
sslProtocols: ['TLSv1', 'SSLv3']
}
}
request.headers['host'] = [{ key: 'host', value: ssrDomainName}];
Just like the response
object, the request
object is equally fragile so be careful. In the solution in this article we returned the response, this time we return the request instead, which will then divert the request to our SSR Lambda instead of the S3 Bucket.
Top comments (3)
Have you ever tried to run SSR on Cloudflare Workers ? (the problem being that its not nodejs, but kind of a webworker running server-side).
I managed to run Nuxt on them (it is blazingly fast... way faster than a lambda), but it implied using a lot of dark magic & hacks that I dont like.
I was wondering if others have had a more sustainable solution ?
I hadn't tried sorry. I've actually never heard of them, so I'll have to take a look.
Have you seen serverless-demo.nuxt.workers.dev?
It is ridiculously fast... and hopefully will be released soon: community.cloudflare.com/t/vue-js-...