DEV Community

Atsushi Suzuki
Atsushi Suzuki

Posted on

Automating Monthly Data Deletion for Aurora MySQL with ECS, EventBridge, and DynamoDB

When handling the deletion of unnecessary data in Aurora MySQL, manually deleting large volumes of data is highly inefficient. Therefore, I implemented a mechanism to automate the deletion process on a monthly basis using batch processing that runs every day.

Optimizing Aurora MySQL Storage by Deleting Unnecessary Data

Initially, I considered using EventBridge and Lambda to handle the batch process. However, I realized that Lambda's maximum timeout (15 minutes) wasn't enough to complete the task. As a result, I decided not to use Lambda.

Instead, I switched to using ECS on Fargate, scheduling daily tasks via EventBridge. This change ensures that the process runs regularly and without timeout issues.

Additionally, I chose DynamoDB to track which data has been deleted. Since DynamoDB offers a free tier, it seemed like a cost-effective solution for storing the deletion progress.

Amazon DynamoDB pricing

Steps to Implement

1. Create a DynamoDB Table

To manage the progress of the data deletion, I used DynamoDB. The table was created with the following settings (all other settings were left as default):

  • Table Name: deletion_progress
  • Partition Key: id (String)

Screenshot

2. Create an IAM Policy for DynamoDB Access

Since the ECS task needs to retrieve, store, and update data in DynamoDB, I created an IAM policy for DynamoDB operations and attached it to the ECS Task Execution Role (ecsTaskExecutionRole).

Below is the policy that was created and attached to the ecsTaskExecutionRole. Make sure to replace the Resource section with the ARN of the DynamoDB table created earlier.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:GetItem",
                "dynamodb:PutItem",
                "dynamodb:UpdateItem"
            ],
            "Resource": "arn:aws:dynamodb:your-region:your-account-id:table/your-table-name"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

3. Set Up Security Groups for ECS

To allow the ECS task to access both Aurora and DynamoDB, I configured security group rules. Port 3306 was opened for Aurora, and port 443 was opened for DynamoDB.

Here's an example using Terraform:

resource "aws_security_group_rule" "ecs_sg_egress_rds" {
  type                     = "egress"
  from_port                = 3306
  to_port                  = 3306
  protocol                 = "tcp"
  security_group_id        = aws_security_group.ecs_sg.id
  source_security_group_id = aws_security_group.rds_sg.id
}

resource "aws_security_group_rule" "ecs_sg_egress_https" {
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  security_group_id = aws_security_group.ecs_sg.id
  cidr_blocks       = ["0.0.0.0/0"]
}
Enter fullscreen mode Exit fullscreen mode

4. Create an ECR Repository

I created an ECR repository called daily-deletion to store the Docker image used by the ECS task.

Screenshot

5. Application Code (Python)

Here is the Python script for deleting records from the operations table in Aurora MySQL for each month from April 2022 to May 2024. The deletion progress (last_year, last_month) is stored in DynamoDB and used for the next execution.

import os
import decimal
import pymysql
import boto3
from datetime import datetime, timedelta
from logging import INFO, Formatter, StreamHandler, getLogger

RDS_HOST = os.getenv('RDS_HOST')
DB_USER = os.getenv('DB_USER')
DB_PASSWORD = os.getenv('DB_PASSWORD')
DB_NAME = os.getenv('DB_NAME')
DYNAMODB_TABLE = os.getenv('DYNAMODB_TABLE')

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(DYNAMODB_TABLE)

logger = getLogger()
logger.setLevel(INFO)

if not logger.hasHandlers():
    handler = StreamHandler()
    logger.addHandler(handler)

    formatter = Formatter(
        '[%(levelname)s]\t%(asctime)s\t%(message)s\n',
        '%Y-%m-%d %H:%M:%S'
    )

    for handler in logger.handlers:
        handler.setFormatter(formatter)

def connect_to_db():  # Function to connect to MySQL
    try:
        logger.info("Attempting to connect to the database")
        connection = pymysql.connect(host=RDS_HOST,
                                     port=3306,
                                     user=DB_USER,
                                     password=DB_PASSWORD,
                                     db=DB_NAME,
                                     cursorclass=pymysql.cursors.DictCursor)
        logger.info("Database connection established successfully")
        return connection
    except pymysql.MySQLError as e:
        logger.error(f"Database connection failed: {e}")
        raise

def get_deletion_progress():  # Get progress from DynamoDB
    response = table.get_item(Key={'id': 'progress'})
    if 'Item' in response:
        last_year = int(response['Item']['last_year'])
        last_month = int(response['Item']['last_month'])

        logger.info(f"Retrieved from DynamoDB - last_year: {last_year}, last_month: {last_month}")
        return last_year, last_month
    else:
        # If no progress is stored, start from April 2022
        return 2022, 3

def update_deletion_progress(year, month):  # Save progress to DynamoDB
    table.put_item(Item={
        'id': 'progress',
        'last_year': year,
        'last_month': month
    })

def delete_records(year, month):  # Delete records for a specific month
    try:
        logger.info(f"Starting record deletion for {year}-{month:02d}")
        connection = connect_to_db()
        with connection.cursor() as cursor:
            # Calculate start and end dates
            start_date = f'{year}-{month:02d}-01'
            end_date = (datetime(year, month, 1) + timedelta(days=32)).replace(day=1).strftime('%Y-%m-%d')

            logger.info(f"Deleting records from {start_date} to {end_date}")

            delete_query = f"""
                DELETE FROM {DB_NAME}.operations
                WHERE created_at >= %s AND created_at < %s;
            """
            cursor.execute(delete_query, (start_date, end_date))
            connection.commit()
            logger.info(f"Deleted records for {year}-{month:02d}")
    except Exception as e:
        logger.error(f"Error deleting records: {e}")
        raise
    finally:
        if connection:
            connection.close()

def main():
    last_year, last_month = get_deletion_progress()

    # Move to the next month
    if last_month == 12:
        next_year = last_year + 1
        next_month = 1
    else:
        next_year = last_year
        next_month = last_month + 1

    # Stop after May 2024
    if next_year > 2024 or (next_year == 2024 and next_month > 5):
        logger.info("Deletion process has completed. No more records to delete.")
        return

    # Delete records and update progress
    try:
        delete_records(next_year, next_month)
        update_deletion_progress(next_year, next_month)
        logger.info(f"Deletion progress updated to {next_year}-{next_month:02d}")
    except Exception as e:
        logger.error(f"Error during deletion process: {e}")
        import traceback
        logger.error(traceback.format_exc())

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

To ensure security, MySQL connection details are retrieved from environment variables:

connection = pymysql.connect(host=RDS_HOST,
                             port=3306,
                             user=DB_USER,
                             password=DB_PASSWORD,
                             db=DB_NAME)
Enter fullscreen mode Exit fullscreen mode

Also, since DynamoDB returns decimal.Decimal for numeric values, they are converted to int for Python to handle them as integers:

last_year = int(response['Item']['last_year'])
last_month = int(response['Item']['last_month'])
Enter fullscreen mode Exit fullscreen mode

6. Build and Push the Docker Image

Next, the Docker image for the ECS task needs to be built locally and pushed to the previously created ECR repository.

First, create a Dockerfile and requirements.txt in the same directory as main.py:

FROM python:3.12-slim

WORKDIR /usr/src/app

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "./main.py"]
Enter fullscreen mode Exit fullscreen mode
boto3
pymysql
Enter fullscreen mode Exit fullscreen mode

Build the image locally. I used buildx since I'm working on an M2 Mac (Arm architecture).

docker buildx build --platform linux/amd64 -f Dockerfile -t <account_id>.dkr.ecr.ap-northeast-1.amazonaws.com/daily-deletion:latest .
Enter fullscreen mode Exit fullscreen mode

Log in to ECR:

aws ecr get-login-password --region ap-northeast-1 | docker login --username AWS --password-stdin <account_id>.dkr.ecr.ap-northeast-1.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Push the built image to ECR:

docker push <account_id>.dkr.ecr.ap-northeast-1.amazonaws.com/daily-deletion:latest
Enter fullscreen mode Exit fullscreen mode

Once pushed, verify that the image is in the ECR repository.

Screenshot

7. Create the ECS Task Definition

Create the ECS task definition in JSON format. Environment variables required by the application are defined under environment. Depending on the data volume, memory and cpu values can be adjusted accordingly.

Also, ensure that "awslogs-create-group": "true" is specified to prevent an error during the first task execution when the CloudWatch log group is created.

{
  "family": "daily-deletion-task",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "daily-deletion-container",
      "image": "<account_id>.dkr.ecr.ap-northeast-1.amazonaws.com/daily-deletion:latest",
      "essential": true,
      "memory": 512,
      "cpu": 256,
      "environment": [
        { "name": "RDS_HOST", "value": "<cluster_endpoint_name>" },
        { "name": "DB_USER", "value": "<mysql_user_name>" },
        { "name": "DB_PASSWORD", "value": "<mysql_user_password>" },
        { "name": "DB_NAME", "value": "<db_schema_name>" },
        { "name": "DYNAMODB_TABLE", "value": "deletion_progress" }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/daily-deletion",
          "awslogs-create-group": "true",
          "awslogs-region": "ap-northeast-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "taskRoleArn": "arn:aws:iam::<account_id>:role/ecsTaskRole",
  "executionRoleArn": "arn:aws:iam::<account_id>:role/ecsTaskExecutionRole"
}
Enter fullscreen mode Exit fullscreen mode

Screenshot

8. Create an EventBridge Schedule

Previously, I assumed that I needed to create an ECS service before running tasks, but I learned that for batch processing, EventBridge can directly trigger ECS tasks without needing a service.

Using Amazon EventBridge Scheduler to schedule Amazon ECS tasks

In the "EventBridge > Schedule" section, create a new schedule (in this case, for daily execution at 3 AM), and specify the task definition, subnet, security groups, etc.

Screenshot

Screenshot

Verifying the Process

I confirmed that the records were successfully deleted by checking the CloudWatch logs.

Screenshot

Conclusion

In this project, I successfully automated the deletion of unnecessary data from Aurora MySQL using ECS, EventBridge, and DynamoDB. By leveraging these AWS services, I was able to overcome the limitations of Lambda and ensure a scalable and efficient process. Additionally, DynamoDB's free tier allowed for cost-effective progress tracking. This solution can be adapted for various use cases where large-scale data management is required, providing both flexibility and control over the deletion process.

Top comments (0)