Introduction
In this blog post, you will learn how to create a Serverless virus scanner on AWS in a step-by-step guide.
Virus scanners are important for detecting and removing malicious software (viruses, malware, etc.). They work by scanning for patterns of code that match known malware signatures or suspicious behaviours. They can be used in safeguarding our digital environments from potential security threats.
Without a reliable virus scanner in place, our our digital environments are left vulnerable to attacks, which could result in serious consequences. Therefore, it's essential to have a trusted virus scanner to protect our systems and networks.
Why a serverless virus scanner?
A serverless virus scanner offers several benefits over traditional server-based solutions. Firstly, it requires less maintenance, which reduces costs significantly 🤑. Secondly, a serverless virus scanner is highly scalable, this makes it easy to increase or decrease the number of scans you run as needed. This makes it a cost-effective solution for protecting your digital environment from potential security threats.
Choosing the Right Services
Choosing the right services and tools is important and proper selection of services can impact cost, performance, and scalability in a positive way. We will be using the following AWS services to build our serverless virus scanner.
- S3: a reliable, scalable, and cheap object storage service that can store large amounts of data.
- EventBridge Scheduler: a service that enables us to schedule events based on time intervals or cron expressions, which we will be using to automate the virus definitions updating process.
- Lambda: the perfect solution for executing our ClamAV virus scanner.
By using Lambda, we don't have to worry about provisioning or managing servers, and it automatically scales. Our Lambda function will be called Clambda, and it will execute the ClamAV virus scanner. We will be writing the code for Clambda in Python, making use of the libraries available for Python.
Designing the Virus Scanner Architecture
To run a serverless virus scanner we need some serverless services from AWS. As mentioned in the previous paragraph we will be using S3, EventBridge Scheduler and Lambda. In the above diagram you can see how these different component work together.
Setting up part 1 - Update virus definitions
It's essential to keep your virus scanner up to date by updating its definitions daily. ClamAV offers two ways to update its virus definitions: freshclam and cdvupdate. Freshclam can be run as a daemon or through a cronjob using the freshclam.conf config file. However, it may cause blacklisting issues with the CDN, making it difficult to develop. A better solution is cdvupdate, a tool that allows you to download and update ClamAV databases and database patch files to host your own database mirror.
Developing the cvdupdate Lambda
We opted to update virus definitions daily, but you can adjust the frequency as needed. To run the cvdupdate Lambda, we’ve set up a daily Eventbridge schedule, which executes the cvdupdate Lambda and downloads/uploads the latest virus definitions to the ClamAV virus definitions bucket.
handlers/cvdupdate/requirements.txt:
cvdupdate
handlers/cvdupdate/virus_definitions_updater.py:
"""cvdupdate Lambda"""
# Standard imports
import os
import logging
# Non standard imports
import boto3
s3 = boto3.client("s3")
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def download_all_virus_definitions(definitions_folder, definitions_bucket, prefix=""):
"""Download latest virus defitions from S3."""
logger.info("Download virus definitions from S3")
os.system(f"mkdir -p {definitions_folder}")
response = s3.list_objects(Bucket=definitions_bucket, Prefix=prefix)
if not response.get("Contents"):
update_virus_definitions(definitions_folder)
else:
for obj in response["Contents"]:
s3.download_file(
definitions_bucket, obj["Key"], definitions_folder + obj["Key"]
)
if obj["Key"].endswith("/"):
download_all_virus_definitions(definitions_bucket, obj["Key"])
def update_virus_definitions(definitions_folder):
"""Update local copy of DBs."""
logger.info("Configure cvd and run cvd update")
os.system(f"python3 /var/task/dependencies/bin/cvd config set --dbdir {definitions_folder}")
os.system("python3 /var/task/dependencies/bin/cvd update")
def upload_all_virus_definitions(definitions_bucket, folder_name):
"""Upload virus definitions to S3."""
logger.info("Upload latest virus definitions to S3")
# Traverse the directory tree and upload each file to the bucket
for root, _, files in os.walk(folder_name):
for file in files:
# Construct the full file path
local_file_path = os.path.join(root, file)
s3_file_path = file
# Use the S3 client to upload the file to the bucket
s3.upload_file(local_file_path, definitions_bucket, s3_file_path)
def lambda_handler(event, context): # pylint: disable=unused-argument
"""Lambda Handler which runs scheduled."""
definitions_bucket = os.getenv("DEFINITIONS_BUCKET")
definitions_folder = "/tmp/virus-definitions/"
# Download all files to /tmp/virus-definitions
download_all_virus_definitions(definitions_folder, definitions_bucket, "")
# Update update update
update_virus_definitions(definitions_folder)
# Upload virus definitions back to S3
upload_all_virus_definitions(definitions_bucket, definitions_folder)
AWS Login:
aws sso login --profile playground-admin
export AWS_PROFILE=playground-admin
AWS CLI:
aws s3api create-bucket \
--bucket clamav-virus-definitions-$(openssl rand -hex 4) \
--region eu-west-1 \
--create-bucket-configuration LocationConstraint=eu-west-1
aws iam create-role \
--role-name cvdupdate-lambda-role \
--assume-role-policy-document '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": "lambda.amazonaws.com"},"Action": "sts:AssumeRole"}]}'
aws iam put-role-policy \
--role-name cvdupdate-lambda-role \
--policy-name s3-access-policy \
--policy-document '{"Version": "2012-10-17", "Statement": [{"Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"], "Resource": ["arn:aws:s3:::clamav-virus-definitions/*", "arn:aws:s3:::clamav-virus-definitions"]}]}'
Makefile:
# SHELL:=/bin/bash
WORKLOAD_NAME ?= my-project
clean:
rm -rf artifacts
mkdir -p artifacts
clean_cvdupdate_dependencies:
rm -rf artifacts/cvdupdate.zip
rm -rf handlers/cvdupdate/dependencies
package_cvdupdate: clean_cvdupdate_dependencies
pip install --target handlers/cvdupdate/dependencies -r handlers/cvdupdate/requirements.txt --upgrade
cd handlers/cvdupdate && zip -r9 ../../artifacts/cvdupdate.zip *
deploy_cvdupdate: clean_cvdupdate_dependencies package_cvdupdate
aws s3 cp artifacts/cvdupdate.zip s3://${WORKLOAD_NAME}-deploy/functions/cvdupdate.zip
aws lambda update-function-code --function-name ${WORKLOAD_NAME}-cvdupdate --s3-bucket=${WORKLOAD_NAME}-deploy --s3-key=functions/cvdupdate.zip
Setting up part 2 - The virus scanner
We chose ClamAV to be our virus scanner, the reasons for this are that it is open source, lightweight, has a high detection rate, is very customisable and well supported.
Developing Clambda (ClamAV+Lambda combined 😎)
handlers/clambda/requirements.txt:
filetype
handlers/clambda/clambda.py:
"""ClamAV Lambda"""
# pylint: disable=logging-fstring-interpolation
# pylint: disable=line-too-long
# Standard imports
import os
import json
import logging
import subprocess
# Non standard imports
import boto3
import filetype # pylint: disable=import-error
# Constants
DEFINITIONS_BUCKET = os.getenv("DEFINITIONS_BUCKET")
DEFINITIONS_FOLDER = os.getenv("DEFINITIONS_FOLDER")
TMP_FOLDER = os.getenv("TMP_FOLDER")
DEFAULT_STATUS = "FAILURE"
# Clients
s3 = boto3.client("s3")
transfer = boto3.client("transfer")
# Logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)
class VirusFoundException(Exception):
"""VirusFoundException for when a virus is found."""
def download_all_virus_definitions(definitions_folder: str, bucket_name: str):
"""Download latest ClamAV virus definitions.
Provide a bucket name on which the ClamAV virus definitions can be found.
Provide a target folder to download them to.
"""
# Code to download the virus definitions from the S3 bucket
logger.info("Downloading latest virus definitions from S3!")
os.makedirs(definitions_folder, exist_ok=True)
response = s3.list_objects(Bucket=bucket_name)
for obj in response.get("Contents", []):
s3.download_file(
bucket_name, obj["Key"], os.path.join(definitions_folder, obj["Key"])
)
def download_file_from_s3(bucket: str, object_key: str, local_file_path: str):
"""Download object from S3 based on bucket name and object key.
File is written to a given local file path.
"""
logger.info(f"Downloading object: arn:aws:s3:::{bucket}/{object_key}...")
os.makedirs(os.path.dirname(local_file_path), exist_ok=True)
s3.download_file(bucket, object_key, local_file_path)
return local_file_path
def guess_file_type(file_path: str):
"""Guess the filetype by using the filetype library."""
file_type = filetype.guess(file_path)
if file_type is None:
logger.info("Cannot guess file type!")
return
logger.info(f"File MIME type is: {file_type.mime}")
def start_clamscan(definitions_folder: str, local_file_path: str):
"""Run the clamscan binary."""
logger.info("Start scanning!")
try:
# Start scan and capture output
scan_output = subprocess.check_output(
f'/var/task/dependencies/bin/clamscan --database {definitions_folder} "{local_file_path}"',
shell=True,
)
# Log scan output
logger.info(scan_output.decode())
logger.info("Scanning done, no viruses found!")
return "SUCCESS" # Overwrite default status
except subprocess.CalledProcessError as exc:
if exc.returncode == 1:
logger.info(f"{exc.output.decode()}")
raise VirusFoundException( # pylint: disable=raise-missing-from
"Virus Found!"
)
logger.error(f"Exception: {exc}")
raise exc
def error_exit(
metadata: dict,
msg="",
exception=None,
):
"""
Exit Function. Always callback the calling AWS Transferservice.
"""
workflow_step_state(**metadata)
raise SystemExit(f"Error {msg}") from exception
def workflow_step_state(
workflow_id: str, execution_id: str, token_id: str, status: str
) -> bool:
"""
Reports the step state back to calling AWS Transfer Workflow
"""
response = transfer.send_workflow_step_state(
WorkflowId=workflow_id, ExecutionId=execution_id, Token=token_id, Status=status
)
return response
def lambda_handler(event, context): # pylint: disable=unused-argument
"""Lambda Handler which gets an event from AWS Transfer Service.
Example event:
{
"token":"secret_token",
"serviceMetadata":{
"executionDetails":{
"workflowId":"w-abcdefghikjl123",
"executionId":"423c2412-15d5-4786-a731-05d4b5f79ba6"
},
"transferDetails":{
"sessionId":"abcdefghikjl123",
"userName":"epeters",
"serverId":"s-abcdefghikjl123"
}
},
"fileLocation":{
"domain":"S3",
"bucket":"sftp-poc-1",
"key":"epeters/test-file.txt",
"eTag":"089c2c18cf2fe8979143223faeb5298e",
"versionId":"None"
}
}
This Lambda makes use of: fileLocation.bucket and fileLocation.key.
We use these values to download the file that needs to be scanned by ClamAV.
Before the scan starts, the latest virus definitions are downloaded from S3.
"""
logger.info(event)
# Extract values from the event
sftp_bucket = event["fileLocation"]["bucket"]
object_key = event["fileLocation"]["key"]
workflow_id = event["serviceMetadata"]["executionDetails"]["workflowId"]
execution_id = event["serviceMetadata"]["executionDetails"]["executionId"]
token_id = event["token"]
# Prepare metadata dictionary
metadata = {
"workflow_id": workflow_id,
"execution_id": execution_id,
"token_id": token_id,
"status": DEFAULT_STATUS,
}
# Download file that needs to be scanned from S3
s3_object = download_file_from_s3(
sftp_bucket, object_key, os.path.join(TMP_FOLDER, object_key)
)
# Guess the file MIME type
# TODO: Do something with this when for example the file type is unsupported
guess_file_type(s3_object)
# Download all virus definitions to /tmp/virus-definitions
download_all_virus_definitions(DEFINITIONS_FOLDER, DEFINITIONS_BUCKET)
# Run the scan
try:
scan_result = start_clamscan(DEFINITIONS_FOLDER, s3_object)
metadata["status"] = scan_result
except Exception as exc: # pylint: disable=broad-except
error_exit(
metadata=metadata,
msg="Failure during scanning file, sending error back to Transfer Service",
exception=exc,
)
# Return the state to the transferservice workflow
response = workflow_step_state(**metadata)
logger.info(json.dumps(response))
NOTE: When I have the time I will improve this by using libclamav directly instead of the Python subprocess module, because it is not ideal to use Python to spawn other programs. However, since
libclamav
is written in C, integrating it with Python on AWS Lambda requires the use of ctypes.ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap
libclamav
in pure Python.
An example of how this can be done can be found on this GitHub repository.
Makefile:
# SHELL:=/bin/bash
WORKLOAD_NAME ?= my-project
clean:
rm -rf artifacts
mkdir -p artifacts
clean_clambda_dependencies:
rm -rf ${PWD}/usr
rm -rf artifacts/clambda.zip
rm -rf handlers/clambda/dependencies
package_clambda: clean_clambda_dependencies
mkdir -p handlers/clambda/dependencies/{bin,lib}
pip install --target handlers/clambda/dependencies -r handlers/clambda/requirements.txt --upgrade
curl -L https://github.com/Cisco-Talos/clamav/releases/download/clamav-${CLAMAV_VERSION}/clamav-${CLAMAV_VERSION}.linux.x86_64.rpm \
--output artifacts/clamav-${CLAMAV_VERSION}.linux.x86_64.rpm
@if [[ ${UNAME} == 'Darwin' ]]; then \
echo "Run macOS commands for package_clambda"; \
tar xvf artifacts/clamav-${CLAMAV_VERSION}.linux.x86_64.rpm \
-C handlers/clambda/dependencies/bin/ \
--strip-components=4 \
usr/local/bin/clamscan; \
tar xvf artifacts/clamav-${CLAMAV_VERSION}.linux.x86_64.rpm \
-C handlers/clambda/dependencies/lib/ \
--strip-components=4 \
usr/local/lib64/*.so.*; \
else \
echo "Run Linux commands for package_clambda"; \
rpm2cpio artifacts/clamav-1.0.0.linux.x86_64.rpm | cpio -idmv; \
mv ${PWD}/usr/local/bin/clamscan handlers/clambda/dependencies/bin/; \
mv ${PWD}/usr/local/lib64/*.so.* handlers/clambda/dependencies/lib/; \
fi
cd handlers/clambda && zip -r9 ../../artifacts/clambda.zip *
deploy_clambda: clean_clambda_dependencies package_clambda
aws s3 cp artifacts/clambda.zip s3://${WORKLOAD_NAME}-deploy/functions/clambda.zip
aws lambda update-function-code --function-name ${WORKLOAD_NAME}-clambda --s3-bucket=${WORKLOAD_NAME}-deploy --s3-key=functions/clambda.zip
Part 2?
You might have noticed references to AWS Transfer Family. This is because Clambda has been integrated into an SFTP solution, making it part of a secure file transfer workflow. If you’re interested in a step-by-step guide on this integration, let me know in the comments. I might write a part 2 for this.
Conclusion
In this guide, we explored how to build a serverless virus scanner on AWS using S3 for storage, EventBridge Scheduler for automation, and Lambda for scanning with ClamAV. This solution is cost-effective, scalable, and requires minimal maintenance, perfect for keeping your systems secure.
Try following the steps to build your own serverless virus scanner, and feel free to share your feedback or ideas for improvements in the comments.
Top comments (0)