DEV Community

Cover image for Say Goodbye to Orphaned Snapshots: Automate Cleanup with Serverless, Terraform, and AWS EventBridge!
Vikas Arora
Vikas Arora

Posted on • Edited on

Say Goodbye to Orphaned Snapshots: Automate Cleanup with Serverless, Terraform, and AWS EventBridge!

Over time, AWS accounts can accumulate resources that are no longer necessary but continue to incur costs. One common example is orphaned EBS snapshots left behind after volumes are deleted. Managing these snapshots manually can be tedious and costly.

This guide shows how to automate the cleanup of orphaned EBS snapshots using Python (Boto3) in an AWS Lambda function, which is then triggered using AWS EventBridge on a schedule or event.

By the end, you’ll have a complete serverless solution to keep your AWS environment clean and cost-effective.

Prerequisites

Installing AWS CLI and Terraform

First, let’s ensure the essential tools are installed.

AWS CLI
The AWS CLI allows command-line access to AWS services. Install it according to your operating system:

macOS: brew install awscli
Windows: AWS CLI Installer
Linux: Use the package manager (e.g., sudo apt install awscli for Ubuntu).
Verify installation:

aws --version
Enter fullscreen mode Exit fullscreen mode

Terraform
Terraform is a popular Infrastructure as Code (IaC) tool for defining and managing AWS resources.

macOS: brew install terraform
Windows: Terraform Installer
Linux: Download the binary and move it to /usr/local/bin.

Verify installation:

terraform -version
Enter fullscreen mode Exit fullscreen mode

Configuring AWS Access

Configure your AWS CLI with access keys to allow Terraform and Lambda to authenticate with AWS services.

Get Access Keys from your AWS account (AWS IAM Console).
Configure AWS CLI:

aws configure
Enter fullscreen mode Exit fullscreen mode

Follow the prompts to enter your Access Key, Secret Access Key, default region (e.g., us-east-1), and output format (e.g., json).

Next, since we are going to build the entire stack with Terraform, please fork the repository located here, which contains the full code for the project.

Clone it to your local machine and open it in a code editor.

I have used Visual Studio Code, and it appears as follows:

Image description

Delete the following two files from the project, as these will be recreated when you run the terraform from your code editor:

  • orphan-snapshot-delete.zip
  • .terraform.lock.hcl

Next, lets configure the S3 backend:

Create an S3 Bucket for Terraform State

1. Go to the S3 Console:

  • Sign in to your AWS account and navigate to the S3 service.

2. Create a New Bucket:

  • Click Create bucket.
  • Give the bucket a unique name, such as my-terraform-state-bucket.
  • Choose an AWS Region that matches your infrastructure region for latency reasons.

3. Configure Bucket Settings:

  • Keep Block Public Access settings enabled to restrict access to the bucket.
  • Versioning: Enable versioning to maintain a history of changes to the state file. This is useful for disaster recovery or rollbacks.
  • Leave other settings as default.

4. Create the Bucket:

  • Click Create bucket to finalize the setup.

Create a DynamoDB Table for State Locking (Optional but Recommended)

Using a DynamoDB table for state locking ensures that only one Terraform process can modify the state at a time, preventing conflicts.

1. Go to the DynamoDB Console:

  • In your AWS Console, go to DynamoDB.

2. Create a New Table:

  • Click Create table.
  • Name your table, e.g., terraform-state-locking.
  • Partition Key: Set the partition key to LockID and use the String data type.

3. Configure Settings:

  • Leave default settings (such as read and write capacity) unless you have specific requirements.
  • Create the table by clicking Create table.

Configure IAM Permissions for Terraform

Terraform needs specific permissions to interact with S3 and DynamoDB (if using locking).

This step is necessary only if you are operating under the least privileged access. If you already have administrator access, you can skip this step.

1. Create or Use an IAM User:

  • If you don’t have an IAM user for Terraform (You can use your own IAM user and attach these policies to it), create one in the IAM Console.
  • Attach policies that grant permissions to access S3 and DynamoDB.

2. Attach S3 and DynamoDB Policies:

Use an inline policy or add the following permissions:

  • Access to the S3 bucket.
  • Access to the DynamoDB table (if using locking).

Example IAM Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::my-terraform-state-bucket/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:GetItem",
                "dynamodb:DeleteItem",
                "dynamodb:DescribeTable"
            ],
            "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/terraform-state-locking"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

After completing all the prerequisites, let's examine the Python and Terraform code that will perform the actual magic.


Step 1: Python Code for Orphaned Snapshot Cleanup

In the code editor, open the orphan-snapshot-delete.py file.

The complete function code is as follows:

import boto3
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    ec2_cli = boto3.client("ec2")
    response = ec2_cli.describe_snapshots(OwnerIds=["self"], DryRun=False)
    snapshot_id = []
    for each_snapshot in response["Snapshots"]:
        try:
            volume_stat = ec2_cli.describe_volume_status(
                VolumeIds=[each_snapshot["VolumeId"]], DryRun=False
            )
        except ec2_cli.exceptions.ClientError as e:
            if e.response["Error"]["Code"] == "InvalidVolume.NotFound":
                snapshot_id.append(each_snapshot["SnapshotId"])
            else:
                raise e

    if snapshot_id:
        for each_snap in snapshot_id:
            try:
                ec2_cli.delete_snapshot(SnapshotId=each_snap)
                logger.info(f"Deleted SnapshotId {each_snap}")
            except ec2_cli.exceptions.ClientError as e:
                return {
                    "statusCode": 500,
                    "body": f"Error deleting snapshot {each_snap}: {e}",
                }

    return {"statusCode": 200}
Enter fullscreen mode Exit fullscreen mode

This Lambda function uses Boto3, AWS’s Python SDK, to list all EBS snapshots, check their associated volume status, and delete snapshots where the volume is no longer available. Here’s the complete function code:

Step 2: Terraform Configuration for Serverless Infrastructure

Using Terraform, we’ll create a Lambda function, IAM role, and policy to deploy this script to AWS. Additionally, we’ll set up an EventBridge rule to trigger Lambda on a regular schedule.

Terraform Setup and Provider Configuration
This section configures Terraform, including setting up remote state management in S3.

Open the terraform file name main.tf in code editor and start reviewing the code as shown in the following sections.

Terraform Setup and Provider Configuration

This section configures Terraform, including setting up remote state management in S3.

Note:

  • Change the required_version value as per the terraform -version output.
  • Update the bucket, key, and dynamodb_table values for the S3 backend to match what you have created in the previous steps.
terraform {
  required_version = ">=1.5.6"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.72.0"
    }
  }
  backend "s3" {
    bucket         = "terraform-state-files-0110"
    key            = "delete-orphan-snapshots/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "tf_state_file_locking"
  }
}

provider "aws" {
  region = var.aws_region
}
Enter fullscreen mode Exit fullscreen mode

IAM Role and Policy for Lambda
This IAM configuration sets up permissions for Lambda to access EC2 and CloudWatch, enabling snapshot deletion and logging.

resource "aws_iam_role" "lambda_role" {
  name               = "terraform_orphan_snapshots_delete_role"
  assume_role_policy = <<EOF
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Action": "sts:AssumeRole",
          "Principal": { "Service": "lambda.amazonaws.com" },
          "Effect": "Allow"
        }
      ]
    }
EOF
}

resource "aws_iam_policy" "iam_policy_for_lambda" {
  name   = "terraform_orphan_snapshots_delete_policy"
  policy = <<EOF
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
              "logs:CreateLogGroup",
              "logs:CreateLogStream",
              "logs:PutLogEvents"
          ],
          "Resource": "arn:aws:logs:*:*:*"
        },
        {
          "Effect": "Allow",
          "Action": [
              "ec2:DescribeVolumeStatus",
              "ec2:DescribeSnapshots",
              "ec2:DeleteSnapshot"
          ],
          "Resource": "*"
        }
      ]
    }
EOF
}

resource "aws_iam_role_policy_attachment" "attach_iam_policy_to_iam_role" {
  role       = aws_iam_role.lambda_role.name
  policy_arn = aws_iam_policy.iam_policy_for_lambda.arn
}
Enter fullscreen mode Exit fullscreen mode

Packaging and Deploying the Lambda Function
Here, we package the Python code and deploy it as a Lambda function.

data "archive_file" "lambda_zip" {
  type        = "zip"
  source_file = "${path.module}/python/orphan-snapshots-delete.py"
  output_path = "${path.module}/python/orphan-snapshots-delete.zip"
}

resource "aws_lambda_function" "lambda_function" {
  filename      = data.archive_file.lambda_zip.output_path
  function_name = "orphan-snapshots-delete"
  role          = aws_iam_role.lambda_role.arn
  handler       = "orphan-snapshots-delete.lambda_handler"
  runtime       = "python3.12"
  timeout       = 30
}
Enter fullscreen mode Exit fullscreen mode

EventBridge Rule for Lambda Invocation
AWS EventBridge allows you to create scheduled or event-based triggers for Lambda functions. Here, we’ll configure EventBridge to invoke our Lambda function on a schedule, like every 24 hours.

You can learn more about EventBridge and scheduled events in AWS documentation here.

resource "aws_cloudwatch_event_rule" "schedule_rule" {
  name        = "orphan-snapshots-schedule-rule"
  description = "Trigger Lambda every day to delete orphaned snapshots"
  schedule_expression = "rate(24 hours)"
}

resource "aws_cloudwatch_event_target" "target" {
  rule      = aws_cloudwatch_event_rule.schedule_rule.name
  arn       = aws_lambda_function.lambda_function.arn
}

resource "aws_lambda_permission" "allow_eventbridge" {
  statement_id  = "AllowExecutionFromEventBridge"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.lambda_function.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.schedule_rule.arn
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Applying the Terraform Configuration

After defining the infrastructure, initialize and apply the Terraform configuration:

terraform init
terraform apply
Enter fullscreen mode Exit fullscreen mode

Step 4: Testing and Monitoring the Lambda Function

To verify that the solution works:

  1. Manually Trigger the Event (optional): For initial testing, trigger the Lambda function manually from the AWS Lambda console.
  2. Monitor CloudWatch Logs: The Lambda function writes logs to CloudWatch, where you can review entries to verify snapshot deletions.
  3. Adjust the Schedule as Needed: Modify the schedule_expression to set a custom frequency for snapshot cleanup.

Enhancements

The following enhancements could be implemented in this project:

  1. Instead of scheduling an Eventbridge rule, the deletion of EBS volumes could be detected by Eventbridge, which would then trigger the Lambda function to delete the corresponding snapshot.
    Image description

  2. Paging could be incorporated into the Python function to manage situations where the number of snapshots is substantial.

Wrapping Up
By combining Python (Boto3), Lambda, AWS EventBridge, and Terraform, we’ve created a fully automated, serverless solution to clean up orphaned EBS snapshots. This setup not only reduces cloud costs but also promotes a tidy, efficient AWS environment. With scheduled invocations, you can rest assured that orphaned resources are consistently removed.

Try this solution in your own AWS account and experience the benefits of automation in cloud resource management!

Please feel free to share your thoughts on this article in the comments section. Thank you for reading.

Top comments (0)