DEV Community

Cover image for Building a Cost-Effective, Serverless Virus Scanner on AWS: A Step-by-Step Guide
Enri Peters
Enri Peters

Posted on

Building a Cost-Effective, Serverless Virus Scanner on AWS: A Step-by-Step Guide

Introduction

In this blog post, you will learn how to create a Serverless virus scanner on AWS in a step-by-step guide.

Virus scanners are important for detecting and removing malicious software (viruses, malware, etc.). They work by scanning for patterns of code that match known malware signatures or suspicious behaviours. They can be used in safeguarding our digital environments from potential security threats.

Without a reliable virus scanner in place, our our digital environments are left vulnerable to attacks, which could result in serious consequences. Therefore, it's essential to have a trusted virus scanner to protect our systems and networks.

Why a serverless virus scanner?

A serverless virus scanner offers several benefits over traditional server-based solutions. Firstly, it requires less maintenance, which reduces costs significantly 🤑. Secondly, a serverless virus scanner is highly scalable, this makes it easy to increase or decrease the number of scans you run as needed. This makes it a cost-effective solution for protecting your digital environment from potential security threats.

Choosing the Right Services

Choosing the right services and tools is important and proper selection of services can impact cost, performance, and scalability in a positive way. We will be using the following AWS services to build our serverless virus scanner.

  1. S3: a reliable, scalable, and cheap object storage service that can store large amounts of data.
  2. EventBridge Scheduler: a service that enables us to schedule events based on time intervals or cron expressions, which we will be using to automate the virus definitions updating process.
  3. Lambda: the perfect solution for executing our ClamAV virus scanner.

By using Lambda, we don't have to worry about provisioning or managing servers, and it automatically scales. Our Lambda function will be called Clambda, and it will execute the ClamAV virus scanner. We will be writing the code for Clambda in Python, making use of the libraries available for Python.

Designing the Virus Scanner Architecture

Architecture

To run a serverless virus scanner we need some serverless services from AWS. As mentioned in the previous paragraph we will be using S3, EventBridge Scheduler and Lambda. In the above diagram you can see how these different component work together.

Setting up part 1 - Update virus definitions

It's essential to keep your virus scanner up to date by updating its definitions daily. ClamAV offers two ways to update its virus definitions: freshclam and cdvupdate. Freshclam can be run as a daemon or through a cronjob using the freshclam.conf config file. However, it may cause blacklisting issues with the CDN, making it difficult to develop. A better solution is cdvupdate, a tool that allows you to download and update ClamAV databases and database patch files to host your own database mirror.

Developing the cvdupdate Lambda

We opted to update virus definitions daily, but you can adjust the frequency as needed. To run the cvdupdate Lambda, we’ve set up a daily Eventbridge schedule, which executes the cvdupdate Lambda and downloads/uploads the latest virus definitions to the ClamAV virus definitions bucket.

handlers/cvdupdate/requirements.txt:

cvdupdate
Enter fullscreen mode Exit fullscreen mode

handlers/cvdupdate/virus_definitions_updater.py:

"""cvdupdate Lambda"""
# Standard imports
import os
import logging

# Non standard imports
import boto3

s3 = boto3.client("s3")
logger = logging.getLogger()

logger.setLevel(logging.INFO)


def download_all_virus_definitions(definitions_folder, definitions_bucket, prefix=""):
    """Download latest virus defitions from S3."""
    logger.info("Download virus definitions from S3")
    os.system(f"mkdir -p {definitions_folder}")
    response = s3.list_objects(Bucket=definitions_bucket, Prefix=prefix)
    if not response.get("Contents"):
        update_virus_definitions(definitions_folder)
    else:
        for obj in response["Contents"]:
            s3.download_file(
                definitions_bucket, obj["Key"], definitions_folder + obj["Key"]
            )
            if obj["Key"].endswith("/"):
                download_all_virus_definitions(definitions_bucket, obj["Key"])


def update_virus_definitions(definitions_folder):
    """Update local copy of DBs."""
    logger.info("Configure cvd and run cvd update")

    os.system(f"python3 /var/task/dependencies/bin/cvd config set --dbdir {definitions_folder}")
    os.system("python3 /var/task/dependencies/bin/cvd update")


def upload_all_virus_definitions(definitions_bucket, folder_name):
    """Upload virus definitions to S3."""
    logger.info("Upload latest virus definitions to S3")
    # Traverse the directory tree and upload each file to the bucket
    for root, _, files in os.walk(folder_name):
        for file in files:
            # Construct the full file path
            local_file_path = os.path.join(root, file)
            s3_file_path = file
            # Use the S3 client to upload the file to the bucket
            s3.upload_file(local_file_path, definitions_bucket, s3_file_path)


def lambda_handler(event, context):  # pylint: disable=unused-argument
    """Lambda Handler which runs scheduled."""
    definitions_bucket = os.getenv("DEFINITIONS_BUCKET")
    definitions_folder = "/tmp/virus-definitions/"

    # Download all files to /tmp/virus-definitions
    download_all_virus_definitions(definitions_folder, definitions_bucket, "")

    # Update update update
    update_virus_definitions(definitions_folder)

    # Upload virus definitions back to S3
    upload_all_virus_definitions(definitions_bucket, definitions_folder)
Enter fullscreen mode Exit fullscreen mode

AWS Login:

aws sso login --profile playground-admin
export AWS_PROFILE=playground-admin
Enter fullscreen mode Exit fullscreen mode

AWS CLI:

aws s3api create-bucket \
    --bucket clamav-virus-definitions-$(openssl rand -hex 4) \
    --region eu-west-1 \
    --create-bucket-configuration LocationConstraint=eu-west-1

aws iam create-role \
    --role-name cvdupdate-lambda-role \
    --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": "lambda.amazonaws.com"},"Action": "sts:AssumeRole"}]}'

aws iam put-role-policy \
    --role-name cvdupdate-lambda-role \
    --policy-name s3-access-policy \
    --policy-document '{"Version": "2012-10-17", "Statement": [{"Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"], "Resource": ["arn:aws:s3:::clamav-virus-definitions/*", "arn:aws:s3:::clamav-virus-definitions"]}]}'


Enter fullscreen mode Exit fullscreen mode

Makefile:

# SHELL:=/bin/bash
WORKLOAD_NAME ?= my-project

clean:
    rm -rf artifacts
    mkdir -p artifacts

clean_cvdupdate_dependencies:
    rm -rf artifacts/cvdupdate.zip
    rm -rf handlers/cvdupdate/dependencies

package_cvdupdate:  clean_cvdupdate_dependencies
    pip install --target handlers/cvdupdate/dependencies -r handlers/cvdupdate/requirements.txt --upgrade
    cd handlers/cvdupdate && zip -r9 ../../artifacts/cvdupdate.zip *

deploy_cvdupdate: clean_cvdupdate_dependencies package_cvdupdate
    aws s3 cp artifacts/cvdupdate.zip s3://${WORKLOAD_NAME}-deploy/functions/cvdupdate.zip
    aws lambda update-function-code --function-name ${WORKLOAD_NAME}-cvdupdate --s3-bucket=${WORKLOAD_NAME}-deploy --s3-key=functions/cvdupdate.zip
Enter fullscreen mode Exit fullscreen mode

Setting up part 2 - The virus scanner

We chose ClamAV to be our virus scanner, the reasons for this are that it is open source, lightweight, has a high detection rate, is very customisable and well supported.

Developing Clambda (ClamAV+Lambda combined 😎)

handlers/clambda/requirements.txt:

filetype
Enter fullscreen mode Exit fullscreen mode

handlers/clambda/clambda.py:

"""ClamAV Lambda"""
# pylint: disable=logging-fstring-interpolation
# pylint: disable=line-too-long

# Standard imports
import os
import json
import logging
import subprocess

# Non standard imports
import boto3
import filetype  # pylint: disable=import-error

# Constants
DEFINITIONS_BUCKET = os.getenv("DEFINITIONS_BUCKET")
DEFINITIONS_FOLDER = os.getenv("DEFINITIONS_FOLDER")
TMP_FOLDER = os.getenv("TMP_FOLDER")
DEFAULT_STATUS = "FAILURE"

# Clients
s3 = boto3.client("s3")
transfer = boto3.client("transfer")

# Logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)


class VirusFoundException(Exception):
    """VirusFoundException for when a virus is found."""


def download_all_virus_definitions(definitions_folder: str, bucket_name: str):
    """Download latest ClamAV virus definitions.
    Provide a bucket name on which the ClamAV virus definitions can be found.
    Provide a target folder to download them to.
    """
    # Code to download the virus definitions from the S3 bucket
    logger.info("Downloading latest virus definitions from S3!")
    os.makedirs(definitions_folder, exist_ok=True)
    response = s3.list_objects(Bucket=bucket_name)
    for obj in response.get("Contents", []):
        s3.download_file(
            bucket_name, obj["Key"], os.path.join(definitions_folder, obj["Key"])
        )


def download_file_from_s3(bucket: str, object_key: str, local_file_path: str):
    """Download object from S3 based on bucket name and object key.
    File is written to a given local file path.
    """
    logger.info(f"Downloading object: arn:aws:s3:::{bucket}/{object_key}...")
    os.makedirs(os.path.dirname(local_file_path), exist_ok=True)
    s3.download_file(bucket, object_key, local_file_path)

    return local_file_path


def guess_file_type(file_path: str):
    """Guess the filetype by using the filetype library."""
    file_type = filetype.guess(file_path)
    if file_type is None:
        logger.info("Cannot guess file type!")
        return
    logger.info(f"File MIME type is: {file_type.mime}")


def start_clamscan(definitions_folder: str, local_file_path: str):
    """Run the clamscan binary."""
    logger.info("Start scanning!")
    try:
        # Start scan and capture output
        scan_output = subprocess.check_output(
            f'/var/task/dependencies/bin/clamscan --database {definitions_folder} "{local_file_path}"',
            shell=True,
        )
        # Log scan output
        logger.info(scan_output.decode())
        logger.info("Scanning done, no viruses found!")
        return "SUCCESS"  # Overwrite default status
    except subprocess.CalledProcessError as exc:
        if exc.returncode == 1:
            logger.info(f"{exc.output.decode()}")
            raise VirusFoundException(  # pylint: disable=raise-missing-from
                "Virus Found!"
            )
        logger.error(f"Exception: {exc}")
        raise exc


def error_exit(
    metadata: dict,
    msg="",
    exception=None,
):
    """
    Exit Function. Always callback the calling AWS Transferservice.
    """
    workflow_step_state(**metadata)
    raise SystemExit(f"Error {msg}") from exception


def workflow_step_state(
    workflow_id: str, execution_id: str, token_id: str, status: str
) -> bool:
    """
    Reports the step state back to calling AWS Transfer Workflow
    """
    response = transfer.send_workflow_step_state(
        WorkflowId=workflow_id, ExecutionId=execution_id, Token=token_id, Status=status
    )

    return response


def lambda_handler(event, context):  # pylint: disable=unused-argument
    """Lambda Handler which gets an event from AWS Transfer Service.
    Example event:
    {
        "token":"secret_token",
        "serviceMetadata":{
            "executionDetails":{
                "workflowId":"w-abcdefghikjl123",
                "executionId":"423c2412-15d5-4786-a731-05d4b5f79ba6"
            },
            "transferDetails":{
                "sessionId":"abcdefghikjl123",
                "userName":"epeters",
                "serverId":"s-abcdefghikjl123"
            }
        },
        "fileLocation":{
            "domain":"S3",
            "bucket":"sftp-poc-1",
            "key":"epeters/test-file.txt",
            "eTag":"089c2c18cf2fe8979143223faeb5298e",
            "versionId":"None"
        }
    }
    This Lambda makes use of: fileLocation.bucket and fileLocation.key.
    We use these values to download the file that needs to be scanned by ClamAV.
    Before the scan starts, the latest virus definitions are downloaded from S3.
    """
    logger.info(event)

    # Extract values from the event
    sftp_bucket = event["fileLocation"]["bucket"]
    object_key = event["fileLocation"]["key"]
    workflow_id = event["serviceMetadata"]["executionDetails"]["workflowId"]
    execution_id = event["serviceMetadata"]["executionDetails"]["executionId"]
    token_id = event["token"]

    # Prepare metadata dictionary
    metadata = {
        "workflow_id": workflow_id,
        "execution_id": execution_id,
        "token_id": token_id,
        "status": DEFAULT_STATUS,
    }

    # Download file that needs to be scanned from S3
    s3_object = download_file_from_s3(
        sftp_bucket, object_key, os.path.join(TMP_FOLDER, object_key)
    )

    # Guess the file MIME type
    # TODO: Do something with this when for example the file type is unsupported
    guess_file_type(s3_object)

    # Download all virus definitions to /tmp/virus-definitions
    download_all_virus_definitions(DEFINITIONS_FOLDER, DEFINITIONS_BUCKET)

    # Run the scan
    try:
        scan_result = start_clamscan(DEFINITIONS_FOLDER, s3_object)
        metadata["status"] = scan_result
    except Exception as exc:  # pylint: disable=broad-except
        error_exit(
            metadata=metadata,
            msg="Failure during scanning file, sending error back to Transfer Service",
            exception=exc,
        )

    # Return the state to the transferservice workflow
    response = workflow_step_state(**metadata)
    logger.info(json.dumps(response))
Enter fullscreen mode Exit fullscreen mode

NOTE: When I have the time I will improve this by using libclamav directly instead of the Python subprocess module, because it is not ideal to use Python to spawn other programs. However, since libclamav is written in C, integrating it with Python on AWS Lambda requires the use of ctypes.

ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap libclamav in pure Python.

An example of how this can be done can be found on this GitHub repository.

Makefile:

# SHELL:=/bin/bash
WORKLOAD_NAME ?= my-project

clean:
    rm -rf artifacts
    mkdir -p artifacts

clean_clambda_dependencies:
    rm -rf ${PWD}/usr
    rm -rf artifacts/clambda.zip
    rm -rf handlers/clambda/dependencies

package_clambda: clean_clambda_dependencies
    mkdir -p handlers/clambda/dependencies/{bin,lib}
    pip install --target handlers/clambda/dependencies -r handlers/clambda/requirements.txt --upgrade
    curl -L https://github.com/Cisco-Talos/clamav/releases/download/clamav-${CLAMAV_VERSION}/clamav-${CLAMAV_VERSION}.linux.x86_64.rpm \
        --output artifacts/clamav-${CLAMAV_VERSION}.linux.x86_64.rpm
    @if [[ ${UNAME} == 'Darwin' ]]; then \
        echo "Run macOS commands for package_clambda"; \
        tar xvf artifacts/clamav-${CLAMAV_VERSION}.linux.x86_64.rpm \
        -C handlers/clambda/dependencies/bin/ \
        --strip-components=4 \
        usr/local/bin/clamscan; \
        tar xvf artifacts/clamav-${CLAMAV_VERSION}.linux.x86_64.rpm \
        -C handlers/clambda/dependencies/lib/ \
        --strip-components=4 \
        usr/local/lib64/*.so.*; \
    else \
        echo "Run Linux commands for package_clambda"; \
        rpm2cpio artifacts/clamav-1.0.0.linux.x86_64.rpm | cpio -idmv; \
        mv ${PWD}/usr/local/bin/clamscan handlers/clambda/dependencies/bin/; \
        mv ${PWD}/usr/local/lib64/*.so.* handlers/clambda/dependencies/lib/; \
    fi
    cd handlers/clambda && zip -r9 ../../artifacts/clambda.zip *

deploy_clambda: clean_clambda_dependencies package_clambda
    aws s3 cp artifacts/clambda.zip s3://${WORKLOAD_NAME}-deploy/functions/clambda.zip
    aws lambda update-function-code --function-name ${WORKLOAD_NAME}-clambda --s3-bucket=${WORKLOAD_NAME}-deploy --s3-key=functions/clambda.zip
Enter fullscreen mode Exit fullscreen mode

Part 2?

You might have noticed references to AWS Transfer Family. This is because Clambda has been integrated into an SFTP solution, making it part of a secure file transfer workflow. If you’re interested in a step-by-step guide on this integration, let me know in the comments. I might write a part 2 for this.

Conclusion

In this guide, we explored how to build a serverless virus scanner on AWS using S3 for storage, EventBridge Scheduler for automation, and Lambda for scanning with ClamAV. This solution is cost-effective, scalable, and requires minimal maintenance, perfect for keeping your systems secure.

Try following the steps to build your own serverless virus scanner, and feel free to share your feedback or ideas for improvements in the comments.

Top comments (0)