DEV Community

Cover image for Web Scraping with AWS Lambda in Rust

Web Scraping with AWS Lambda in Rust

Recently, I have been working on delivering best-in-class observability products for AWS Lambda and they happen to be built in Rust. Yet I haven't had the chance to build an actual function using the language.

For a good first project I decided to code a web scraper which would parse the AWS docs and alert me when a new AWS Lambda runtime is supported or deprecated.


It might feel scary to work with an environment which is not directly provided – unlike NodeJS or Python – but the Rust community is growing rapidly and even AWS maintains other packages besides their SDKs.

An impressive advantage of using Rust for microservices is that the initialization time will be extremely small, compared to other popular runtimes. As per latest benchmarks from Maxime David, from his popular site lambda-perf, the average for hello-world is sitting at ~14ms.

Hello World

Let me teach you first how to easily set up an AWS Lambda in Rust using the vetted tool Cargo Lambda.

After installing the tool in your environment, just run the following command and you should be good to go!

cargo lambda new hello-world
Enter fullscreen mode Exit fullscreen mode

You should now have a new directory, which looks very similar to a Cargo project, and your main file should look like this.

// main.rs
use lambda_runtime::{service_fn, Error, LambdaEvent};
use serde_json::{json, Value};

#[tokio::main]
async fn main() -> Result<(), Error> {
    let func = service_fn(func);
    lambda_runtime::run(func).await?;
    Ok(())
}

async fn func(event: LambdaEvent<Value>) -> Result<Value, Error> {
    let (_event, _context) = event.into_parts();

    Ok(json!({ "message": "Hello Ferris!" }))
}
Enter fullscreen mode Exit fullscreen mode

Test locally by running the emulator with the watch command.

cargo lambda watch
Enter fullscreen mode Exit fullscreen mode

Invoking in another terminal with invoke.

cargo lambda invoke --data-ascii "{ \"hello\": \"world\" }"
# {"message":"Hello Ferris!"}
Enter fullscreen mode Exit fullscreen mode

Simple enough, right? We will dive into deploying to AWS later.

Web Scraping

There are multiple ways of scraping the web with Rust, I have chosen to use scraper due the simplicity of my use case.

I just need to query a page, extract some tables, and see if there have been any changes since my last snapshot.


AWS Lambda with HTTP adapter

I will modify the previous example, because I want to be able to trigger this function through a Lambda URL.

cargo add lambda_http
Enter fullscreen mode Exit fullscreen mode

I’ll separate the handler from the boilerplate needed to execute our HTTP adapter.

// main.rs
use lambda_http::{run, service_fn, tracing, Error};
mod http_handler;
use http_handler::handler;

#[tokio::main]
async fn main() -> Result<(), Error> {
    tracing::init_default_subscriber();

    run(service_fn(handler)).await
}
Enter fullscreen mode Exit fullscreen mode

Now our handler looks more similar to an HTTP API in Rust.

// http_handler.rs
use lambda_http::{Body, Error, Request, Response};

pub(crate) async fn handler(_event: Request) -> Result<Response<Body>, Error> {
    let resp = Response::builder()
        .status(200)
        .header("content-type", "text/html")
        .body(hello world.into())
        .map_err(Box::new)?;
    Ok(resp)
}
Enter fullscreen mode Exit fullscreen mode

Using scraper

Moreover, we need to install an HTTP client and our scraper.

cargo add reqwest scraper
Enter fullscreen mode Exit fullscreen mode

To start scraping, we simply have to make a request to the site we are extracting the data from, convert that response to text, and then navigate through the document!

// http_handler.rs
use lambda_http::{Body, Error, Request, Response};
use reqwest::{header::HeaderMap, Client};
use scraper::{Html, Selector};
// … rest of code

async fn scrape(client: Client) {
    let response = client.get(AWS_LAMBDA_RUNTIMES_URL).send().await?;

    // Get the page as text
    let body = response.text().await?;

    // Parse the document
    let document = Html::parse_document(&body);
}
Enter fullscreen mode Exit fullscreen mode

In my case, the elements I care about in the AWS Lambda docs page are the HTML tables, so I need to create a selector for it, and then find the references in the document.

HTML table to scrape
This is how a row in the table looks like.

// scrape method

// Select the table element
let table_selector = Selector::parse(table).unwrap();

// Get the reference to the first table element in the document
let tables_ref = document.select(&table_selector).next().unwrap();
Enter fullscreen mode Exit fullscreen mode

I will repeat the same process of creating a selector and then using it to get to every row of the table.

// scrape method

// Get the table body
let tbody_selector = Selector::parse("tbody").unwrap();
let tbody_ref = table_ref.select(&tbody_selector).next().unwrap();

// Get the table rows from the table body
let tr_selector = Selector::parse("tr").unwrap();
let tr_elements = tbody_ref.select(&tr_selector);

// On every row, filter out empty spaces and breaklines
let rows = tr_elements.for_each(|e| {
   e.text()
    .map(|t| t.trim())
    .filter(|t| !t.is_empty())
    .collect::<Vec<_>>();
});
Enter fullscreen mode Exit fullscreen mode

These will allow me to have an array of rows, Vec<Vec<String>>, which looks like.

[ 
   [
      “Node.js 22”,”nodejs22.x”,”Amazon Linux 2023”,”Not scheduled”,”Not scheduled”,”Not scheduled”
   ],
   …
]
Enter fullscreen mode Exit fullscreen mode

Scraped data printed in the terminal

Deploying with AWS CDK

It is always a good practice to keep our cloud resources controlled through Infrastructure as Code (IaC). AWS provides the Cloud Development Kit (CDK) so developers can manage their resources more efficiently.

The same organization of developers of cargo-lambda provide a CDK construct – an abstraction that represents AWS CloudFormation resources and their configuration – for an AWS Lambda developed in Rust.

In your CDK project simply install the cargo-lambda-cdk construct, available in multiple runtimes, but here we will use NPM to install it.

npm install cargo-lambda-cdk
Enter fullscreen mode Exit fullscreen mode

And simply use it in your stack like you would for any other AWS Lambda.

import { RustFunction } from 'cargo-lambda-cdk';

new RustFunction(stack, 'Rust function', {
  manifestPath: 'path/to/package/directory/with/Cargo.toml',
});
Enter fullscreen mode Exit fullscreen mode

That’s pretty much it. My average initialization time, dependencies included, ended up being around ~40ms, while an invocation for processing took an average of ~300ms. Pretty good numbers overall, although I did not invest much in performance here.

I open sourced my project, so it can be used as a guide to your needs. There I include more processing in order to compare snapshots of scraped data, please check it out!

[rust] add runtime scraper #4

What?

Adds a function that scrapes the AWS Lambda docs to check the supported and deprecated runtimes.

How?

Used the following crates:

  • scraper: web scraping
  • similar: to diff data
  • prettytable-rs: to create easy-to-print tables
  • lambda_http: adapter for HTTP triggered AWS Lambdas

Summary

Setting up Rust in AWS Lambda is far simpler than one might think, combined with its astonishing start up times, I would say it is a powerful contender for microservices. This does not mean that you have to rewrite everything in Rust.

Web scraping can be far more complex, in this guide I only explain a simple use case for an element inside a page, always make sure to read more and research on your own, but I hope this encourages you to explore scraping in Rust.


🇲🇽 This post is also available in Spanish in my personal blog

Web Scraping con AWS Lambda en Rust | Jordan González – Blog

Aquí aprenderás como hacer web scraping con Rust, el paquete será desplegado en una AWS Lambda, haciendo utilidad de las herramientas de cargo-lambda.

favicon jordangonzalez.dev

References

[1] David, M. (2025). Lambda Cold Start benchmarks. Lambda-perf.

[2] Cargo Lambda. (2025). Rust functions on AWS Lambda made simple. Cargo Lambda.

[3] Rust Scraper. (2024) HTML parsing and querying with CSS selectors. scraper.

[4] Cargo Lambda. (2024). About CDK Construct to build Rust functions with Cargo Lambda. Cargo Lambda CDK construct.

Top comments (0)