Faruq Abdulsalam

Posted on Jan 16 • Edited on Jan 30

How to Build and Test a PDF Generator Lambda Using LocalStack on Your Local Machine

#softwareengineering #cloud #aws #docker

Have you ever struggled with adding a PDF generation feature to your application? I’ve certainly faced that pain more than a few times. Whether it’s generating receipts, weekly or monthly reports, or just about anything else, PDF generation is a feature that seems to pop up in almost every project I’ve worked on. Technology is fascinating, and users are always pushing for features that make their lives easier—and let’s face it, users are always right 😉.

In this part of the series, I’ll walk you through how to easily create a Lambda function that handles PDF generation. We’ll build the function as a Docker image, push it to the local Elastic Container Registry (ECR) service within LocalStack, and then test it locally. To validate the generated PDF, we’ll also integrate the local S3 service in LocalStack. This setup enables you to develop, test, and debug your Lambda function entirely on your local machine before deploying it to the AWS cloud.

If you haven’t already read my previous article on working with AWS S3 locally using LocalStack, I highly recommend giving it a read. It covers the setup of both the LocalStack environment and your development environment, along with easy steps for working with AWS S3 in LocalStack. In this article, we'll be building on that foundation as we create a Lambda function for PDF generation and integrate it with LocalStack’s S3 service. You can find that previous article here.

Buckle up and enjoy the ride!

Prerequisites

Docker
Python 3.9+
LocalStack account
LocalStack CLI
LocalStack Desktop

FILES and CODE

Let's Start Coding! In this section, we will create a few essential files that will allow us to build and test our Lambda function for generating PDFs and uploading them to S3 using LocalStack.

We'll create a new directory called pdf-generator-lambda inside the localstack directory, where you previously set up your virtual environment. This new directory will contain the following files:

requirements.txt
Dockerfile
logger.py
pdf_generator.py
lambda_function.py
event.json
templates/test.html
upload_file.py

Let's walk through each of these files and their role in the project.

1. requirements.txt
This file lists the Python dependencies required for our Lambda function. These dependencies include:

Jinja2: A templating engine that allows you to create dynamic templates.
pdfkit: A Python wrapper for the wkhtmltopdf tool that converts HTML to PDFs.
boto3: The AWS SDK for Python, which is used to interact with AWS services such as S3.
botocore: A low-level, foundational library for interacting with AWS APIs, required by boto3.

Here’s the content for requirements.txt:

Jinja2==3.1.4
pdfkit==1.0.0
boto3==1.26.146
botocore==1.29.146

2. Dockerfile
The Dockerfile defines how the Lambda function will be packaged as a Docker container. It sets up the environment, installs dependencies, and copies the necessary files for your Lambda function. Here’s a breakdown of the Dockerfile:

Base Image: We’re using the official public.ecr.aws/lambda/python:3.10 base image for Lambda functions with Python 3.10. This image is pre-configured to run AWS Lambda functions.
Installing Dependencies: We install wkhtmltopdf with the correct binaries for either x86_64 or aarch64 architecture to ensure compatibility and other necessary tools that the Lambda function will need to generate PDFs.
Copying Code: We copy the Python files from the local environment into the Docker container.
Lambda Handler: The entry point of the Lambda function is defined as lambda_function.lambda_handler.

Here’s the content for the Dockerfile:

FROM public.ecr.aws/lambda/python:3.10

# Install necessary tools and clean up
RUN yum -y install gcc gcc-c++ unzip which \
    && yum clean all \
    && rm -rf /var/cache/yum

# Set the Lambda task root explicitly
ENV LAMBDA_TASK_ROOT=/var/task

# Copy and install Python dependencies
COPY ./requirements.txt ${LAMBDA_TASK_ROOT}
RUN pip3 install --upgrade pip \
    && pip3 install --no-cache-dir -r requirements.txt

# Install wkhtmltopdf with architecture-specific binaries
RUN ARCH=$(uname -m) && \
    if [ "$ARCH" = "x86_64" ]; then \
        curl -L -o wkhtmltopdf.rpm "https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6-1/wkhtmltox-0.12.6-1.amazonlinux2.x86_64.rpm"; \
    elif [ "$ARCH" = "aarch64" ]; then \
        curl -L -o wkhtmltopdf.rpm "https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6-1/wkhtmltox-0.12.6-1.amazonlinux2.aarch64.rpm"; \
    fi && yum -y install wkhtmltopdf.rpm && rm -f wkhtmltopdf.rpm

# Optionally move wkhtmltopdf to /opt/bin and set permissions
RUN mkdir -p /opt/bin && mv /usr/local/bin/wkhtmltopdf /opt/bin/wkhtmltopdf && chmod +x /opt/bin/wkhtmltopdf

# Add app files
COPY . ${LAMBDA_TASK_ROOT}

# Set the command for Lambda runtime
CMD ["lambda_function.lambda_handler"]

3. logger.py
Here, we initialize the logger module, making it reusable across all scripts in the directory. The get_logger function sets up basic logging with a log level of INFO, and specifies a log format that includes the timestamp, logger name, log level, and the message. This logger is then returned and can be used in other scripts to maintain consistent logging throughout the project..

import logging

def get_logger():
    logging.basicConfig(
        level=logging.INFO,  # Set the minimum log level (DEBUG, INFO, WARNING, etc.)
        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",  # Log format
    )

    logger = logging.getLogger(__name__)
    return logger

4. pdf_generator.py
This script will handle the generation of PDFs using the pdfkit library, which works in conjunction with the wkhtmltopdf binary to render HTML templates into PDFs.

import pdfkit
import os
import jinja2
from logger import get_logger

TEMPLATES_DIR = os.path.join(os.path.dirname(__file__), "templates")
PATH_WKHTMLTOPDF = os.getenv("PATH_WKHTMLTOPDF", "/opt/bin/wkhtmltopdf")
KITOPTIONS = {
    "enable-local-file-access": None,
    "disable-smart-shrinking": "",
    "page-width": "157mm",
    "page-height": "222.77mm",
    "dpi": 400,
    "encoding": "UTF-8",
    "margin-top": "0cm",
    "margin-right": "0cm",
    "margin-bottom": "0cm",
    "margin-left": "0cm",
    "custom-header": [("Accept-Encoding", "gzip")],
    "no-outline": None,
}
TEMPLATE = os.path.join(TEMPLATES_DIR, "test.html")

logger = get_logger()


def handle_pdf_generation(body):
    """
    Handle PDF generation

    :param body: dict

    :return: binary_pdf
    """

    try:
        logger.info("Starting PDF generation")

        # Ensure templates directory exists
        if not os.path.exists(TEMPLATES_DIR):
            raise Exception(f"Templates directory not found at {TEMPLATES_DIR}")

        # Ensure template file exists
        if not os.path.exists(TEMPLATE):
            raise Exception(f"Template file {TEMPLATE} not found")

        # Load and render template
        template = jinja2.Template(open(TEMPLATE).read())
        rendered_template = template.render(data=body)

        # Configure pdfkit
        configuration = pdfkit.configuration(wkhtmltopdf=PATH_WKHTMLTOPDF)

        logger.info("Generating PDF")
        pdf_file = pdfkit.from_string(
            rendered_template,
            output_path=False,
            options=KITOPTIONS,
            configuration=configuration,
        )

        logger.info("PDF generated successfully")
        return pdf_file
    except OSError as e:
        logger.error(f"Error with wkhtmltopdf binary: {e}")
        raise Exception(f"Error with wkhtmltopdf binary: {e}")
    except Exception as e:
        logger.error(f"Error generating PDF: {e}")
        raise Exception(f"Error generating PDF: {e}")

5. lambda_function.py
This file will contain the logic for generating PDFs and uploading them to S3. We will use the handle_pdf_generation function to generate a PDF and upload_pdf to upload it to a local S3 bucket (using LocalStack).

import json
import random
from pdf_generator import handle_pdf_generation
from upload_file import upload_pdf
from logger import get_logger


logger = get_logger()


def lambda_handler(event, context):
    logger.info(f"Received event: {event}")
    try:
        body = event.get("body")
        if not body:
            raise Exception("No body found in the request")
        logger.info(f"Received body: {body}")

        # If body is a string, load it as JSON
        if isinstance(body, str):
            body = json.loads(body)

        logger.info("Generating PDF")
        pdf_result_data = handle_pdf_generation(body)

        file_name = "test_" + str(random.randint(1, 1000)) + ".pdf"

        logger.info(f"Uploading PDF to S3 with file name: {file_name}")
        upload_pdf(pdf_result_data, file_name)
        logger.info(f"PDF uploaded successfully to S3 as {file_name}!")

        return {
            "statusCode": 200,
            "body": json.dumps({"message": "PDF generated and uploaded successfully!"}),
        }

    except Exception as e:
        logger.error("Error: %s", e, exc_info=True)
        return {
            "statusCode": 500,
            "body": json.dumps({"error": f"Internal server error: {e}"}),
        }

6. event.json
This file contains a sample event that will be used to test our Lambda function.

{
  "body" : {"first_name": "John", "last_name": "Doe"}
}

7. templates/test.html
Create a test.html file inside the templates directory within the pdf-generator-lambda directory. This file will define a simple HTML structure that displays the first_name and last_name values passed to the Lambda function. The HTML will later be rendered and converted to a PDF.

Here’s the code to include in test.html:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>Test Page</title>
    <link
      href="https://fonts.googleapis.com/css2?family=Poppins:wght@100;200;300;400;500;600;700&display=swap"
      rel="stylesheet"
    />
    <link
      href="https://api.fontshare.com/css?f[]=clash-display@400,700&display=swap"
      rel="stylesheet"
    />
    <link
      href="https://fonts.googleapis.com/css2?family=Roboto:wght@100;300;400;500;700&display=swap"
      rel="stylesheet"
    />
    <style>
      body {
        width: 595px;
        margin: 0 auto;
        font-family: "Poppins", sans-serif;
        font-size: 14px;
        line-height: 1.6;
        color: #333;
      }

      main {
        width: 100%;
        height: 100%;
      }

      a {
        color: #26639b;
        text-decoration: none;
      }

      h1 {
        font-size: 28px;
        font-weight: 600;
        color: #166082;
        margin-bottom: 20px;
        text-align: center;
      }

      .page-content {
        width: 100%;
        height: 100%;
      }

      .label {
        font-weight: 600;
        color: #166082;
        font-size: 14px;
      }

      .value {
        font-size: 14px;
        white-space: pre-line;
      }

      .detail {
        margin-bottom: 15px;
        padding-left: 55px;
        padding-right: 55px;
        padding-top: 5px;
      }

      .footer {
        margin-top: 20px;
        text-align: center;
        font-size: 12px;
        color: #888;
      }
    </style>
  </head>
  <body>
    <main>
      <div class="page-content">
        <h1>PDF GENERATOR</h1>
        <div class="detail">
          <span class="label">First Name:</span>
          <span class="value">- {{data.first_name}}</span>
        </div>
        <div class="detail">
          <span class="label">Last Name:</span>
          <span class="value">- {{data.last_name}}</span>
        </div>
      </div>
    </main>
    <div class="footer">
      <p>Generated by PDF Generator</p>
    </div>
  </body>
</html>

8. upload_file.py
This script handles uploading a generated PDF to an S3 bucket. It sets up a boto3 S3 client with mock AWS credentials and a LocalStack endpoint, specifically using the http://host.docker.internal:4566 URL to ensure communication between the Docker container and LocalStack. The upload_pdf function takes the PDF in byte format, wraps it in a BytesIO object, and calls the upload_to_s3 function, which handles the file upload.

The script now checks if the S3 bucket (my-pdf-bucket) exists before attempting the upload. If the bucket does not exist, an error message is logged, and an exception is raised. This ensures that the upload only occurs if the bucket is present. The check_bucket_exists function checks for the bucket's existence by calling the head_bucket operation. If the bucket is not found, the error is caught, and the appropriate message is logged.

If the S3_BUCKET_NAME environment variable is not set or is invalid, the script will log an error and raise an exception before attempting to upload the file, ensuring that the upload operation doesn't proceed without a valid bucket name.

import boto3
from logger import get_logger
from io import BytesIO
from botocore.exceptions import ClientError

logger = get_logger()

AWS_ACCESS_KEY_ID = "test"
AWS_SECRET_ACCESS_KEY = "test"
S3_BUCKET_NAME = "my-pdf-bucket"
BUCKET_REGION = "eu-west-1"
LOCALSTACK_HOST = "http://host.docker.internal:4566"  # Default LocalStack endpoint


def upload_to_s3(file_bytes, filename, mimetype, object_name=None):
    """
    Uploads a file to an S3 bucket

    :param file_bytes: Bytes object of the file to be uploaded
    :param filename: Name of the file
    :param mimetype: MIME type of the file
    :param object_name: Name of the object in the bucket

    :return: True if the file was uploaded, else False
    """

    if object_name is None:
        object_name = filename

    try:
        s3_client = create_S3_client()

        if not check_bucket_exists(s3_client, S3_BUCKET_NAME):
            logger.error(f"Bucket {S3_BUCKET_NAME} does not exist.")
            raise Exception(f"Bucket {S3_BUCKET_NAME} does not exist.")

        # Wrap the bytes object in a BytesIO object
        file_obj = BytesIO(file_bytes)

        # Upload the file object to S3 bucket
        s3_client.upload_fileobj(
            file_obj, S3_BUCKET_NAME, object_name, ExtraArgs={"ContentType": mimetype}
        )

        logger.info(f"{object_name} uploaded to {S3_BUCKET_NAME} bucket")
    except ClientError as e:
        logger.error(e)
        raise Exception(f"Error uploading file: {e}")
    except Exception as e:
        logger.error(e)
        raise Exception(f"Error uploading file: {e}")


def create_S3_client():
    """Create an S3 client."""

    s3_client = boto3.client(
        "s3",
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
        region_name=BUCKET_REGION,
        endpoint_url=LOCALSTACK_HOST,  # Point to LocalStack endpoint
    )

    return s3_client

def check_bucket_exists(s3_client, bucket_name):
    """Check if the S3 bucket exists."""
    try:
        s3_client.head_bucket(Bucket=bucket_name)
        return True  # Bucket exists
    except ClientError as e:
        # If the error is a 404, the bucket doesn't exist
        if e.response["Error"]["Code"] == "404":
            return False
        else:
            # Other errors, log and re-raise
            logger.error(f"Error checking if bucket exists: {e}")
            raise

def upload_pdf(file_bytes, filename):
    # Check if the S3 bucket name is valid
    if not S3_BUCKET_NAME:
        logger.error("S3_BUCKET_NAME is not set properly. The file cannot be uploaded.")
        raise Exception(
            "Invalid S3_BUCKET_NAME environment variable. File upload failed."
        )
    try:
        upload_to_s3(file_bytes, filename, "application/pdf")
        logger.info("File uploaded successfully!")
    except Exception as e:
        logger.error(f"Error: {e}")
        raise Exception(f"Error uploading file: {e}")

Now that we've covered all the files and code required, let's move on to setting up the services.

SERVICES SETUP AND TESTING

Before proceeding, ensure that your LocalStack container is up and running. If it isn’t already running, export your LOCALSTACK_AUTH_TOKEN and start a new container using the following command:

DEBUG=1 localstack start

S3 Bucket
First, let's create the S3 bucket where we'll store the generated PDFs. Run the following command to create a bucket named my-pdf-bucket:

awslocal s3 mb s3://my-pdf-bucket

Then verify the bucket creation via the CLI using awslocal s3 ls or by checking your LocalStack Desktop application.

Elastic Container Registry (ECR)
Now, let's create a repository in the Elastic Container Registry (ECR) to store the Docker image for the function. Run the command below to create a new repository:

awslocal ecr create-repository --repository-name pdf-generator-image

You should see a response similar to:

{
    "repository": {
        "repositoryArn": "arn:aws:ecr:eu-west-1:000000000000:repository/pdf-generator-image",
        "registryId": "000000000000",
        "repositoryName": "pdf-generator-imager",
        "repositoryUri": "000000000000.dkr.ecr.eu-west-1.localhost.localstack.cloud:4566/pdf-generator-image",
        "createdAt": 1736994302.0,
        "imageTagMutability": "MUTABLE",
        "imageScanningConfiguration": {
            "scanOnPush": false
        },
        "encryptionConfiguration": {
            "encryptionType": "AES256"
        }
    }
}

Please note down your repositoryUri as we'll need it in subsequent steps. If you clear your terminal after creating the repository, you can also easily retrieve these details later by running this command:

awslocal ecr describe-repositories

Docker Image
Now let’s time to build our Docker image, tag it with the repositoryUri from earlier, and push it to the Elastic Container Registry (ECR).

First, run the following command to build the Docker image:

docker build --no-cache -t lambda-container-image .

Once the build is complete, tag the image with your repository URI:

docker tag lambda-container-image 000000000000.dkr.ecr.eu-west-1.localhost.localstack.cloud:4566/pdf-generator-image

Note: Replace the repositoryUri in the command above with your own URI if it's different from the one provided in this example.

Finally, push the tagged image to ECR:

docker push 000000000000.dkr.ecr.eu-west-1.localhost.localstack.cloud:4566/pdf-generator-image

Note: Be sure to use the correct repositoryUri when tagging and pushing the image. Using the wrong repositoryUri or tagId will result in errors.

Lambda Function
Next, let's create the Lambda function using the image we uploaded to the Elastic Container Registry.

Run the following command to create the Lambda function:

awslocal lambda create-function \
--function-name pdf-generator-lambda \
--package-type Image \
--code ImageUri="000000000000.dkr.ecr.eu-west-1.localhost.localstack.cloud:4566/pdf-generator-image" \
--role arn:aws:iam::000000000000:role/lambda-role \
--handler lambda_function.lambda_handler \
--timeout 60 \
--architectures arm64

In this command:

We set the Lambda function name to pdf-generator-lambda.
The ImageUri refers to the repositoryUri of the image we uploaded earlier to ECR.
The handler specifies the entry point of the Lambda function. In this case, it's the lambda_handler function in the lambda_function.py file, which runs our entire process.
We set the timeout to 60 seconds to allow sufficient time for both PDF generation and the upload process. You can adjust this based on your system and the expected size of the PDF. Larger PDFs will take more time to generate and upload.
The architectures parameter is set to arm64 to match the architecture of my local processor. If you're using a machine with an x86_64 architecture, you should update this accordingly.

For a quick sanity check, run the following command to confirm that your Lambda function has been created:

awslocal lambda list-functions

This will list all Lambda functions deployed in your LocalStack environment. If successful, you should see your pdf-generator-lambda function in the output.

Finally, let's invoke our Lambda function to validate everything we've done. Run the following command:

awslocal lambda invoke --function-name pdf-generator-lambda --payload file://event.json /tmp/lambda.out

In this command:

--function-name specifies the name of the Lambda function to invoke, which is pdf-generator-lambda in this case.
--payload passes an input event to the function, simulating a real-world invocation. Here, the payload is sourced from the event.json file.
/tmp/lambda.out is the output file path where the logs generated by the Lambda function during execution will be stored.

After running the command, you should see the following response in your terminal, indicating that the Lambda function executed successfully:

{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}

To confirm that the function executed successfully, I'll be using the LocalStack Desktop application, as it simplifies downloading the generated PDF file from the S3 bucket. Navigate to the S3 resource in the resource list within your active container in the LocalStack Desktop interface. You should see your PDF file listed there. Click on it, and you’ll be prompted to download the file. Download the file and verify that it contains the expected content similar to the screenshot below.

If you can see the generated PDF, that means we have successfully created a Lambda function locally, built it as a Docker image, created an Elastic Container Registry, uploaded the image to it, set up an S3 bucket, invoked our Lambda function, and successfully generated a PDF that was stored in the S3 bucket. And the best part? We accomplished all of this without needing an actual AWS account. Pretty impressive, right?

Now, if you need to make changes to your Lambda function, you’ll have to repeat the process: rebuild the image, retag it, push it to the ECR, and then update the Lambda function with the new image. To update your Lambda function, you can use the following command:

awslocal lambda update-function-code \
--function-name pdf-generator-lambda \
--image-uri "000000000000.dkr.ecr.eu-west-1.localhost.localstack.cloud:4566/pdf-generator-image"

Congratulations! 🎉
You’ve reached the end of this article—thank you for reading! 😊 I hope to extend this series with more articles, so stay tuned.

If you have any questions, feel free to leave a comment or send me a message on LinkedIn. I’ll make sure to respond as quickly as I can. Ciao 👋