Introduction
In this technical blog, we will explore the SportsDataBackup project, which automates fetching sports highlights, storing data in Amazon S3 and DynamoDB, processing videos, and running on a schedule using AWS ECS Fargate and EventBridge. This guide will walk you through the setup, configuration, and deployment of this cloud-native automation.
Project Overview
The SportsDataBackup system is designed to:
- Retrieve sports highlights from RapidAPI.
- Store metadata in Amazon DynamoDB.
- Save highlight videos in Amazon S3.
- Process videos using AWS MediaConvert.
- Schedule execution using AWS EventBridge and ECS Fargate.
- Monitor logs via Amazon CloudWatch.
Prerequisites
Before we begin, ensure the following dependencies are installed:
1. Create a RapidAPI Account
Register on RapidAPI.
Get your API key to access sports highlights.
2. Install Required Tools
Docker (Pre-installed in most environments)
docker --version
AWS CLI (Pre-installed in AWS CloudShell)
aws --version
-Python3
python3 --version
- gettext (For environment variable substitution)
Install on Ubuntu/Debian: sudo apt install gettext
Install on macOS (Homebrew): brew install gettext
Install on Windows (Chocolatey): choco install gettext
Also, you can follow the installation step here
3. Retrieve AWS Account ID
- Run the following command:
aws sts get-caller-identity --query "Account" --output text
- Save your AWS Account ID for later.
4. Retrieve AWS Access Keys
- Navigate to IAM Dashboard > Users > Security Credentials.
- Create and save Access Key and Secret Access Key.
Step-by-Step Setup
Step 1: Clone the Repository
git clone https://github.com/princemaxi/SportsDataBackup
cd SportsDataBackup/src
Step 2: Configure Environment Variables
Modify the .env file with the relevant values:
AWS_ACCOUNT_ID=your-account-id
AWS_ACCESS_KEY=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-access-key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-s3-bucket
RAPIDAPI_KEY=your-rapidapi-key
MEDIA_CONVERT_ENDPOINT=
aws mediaconvert describe-endpoints --query "Endpoints[0].Url" --output text
SUBNET_ID=subnet-xxx
SECURITY_GROUP_ID=sg-xxx
Steps for getting SubnetID and Security Group ID:
- In the github repo, there is a resources folder and copy the entire contents
- In the AWS Cloudshell or vs code terminal, create the file vpc_setup.sh and paste the script inside.
Run the script
vpc_setup.sh
- You will see variables in the output, paste these variables into Subnet_ID and Security_Group_ID
Step 3: Load Environment Variables
set -a
source .env
set +a
Verify the variables:
echo $AWS_LOGS_GROUP
echo $TASK_FAMILY
echo $AWS_ACCOUNT_ID
Step 4: Generate JSON Configuration Files
Use envsubst to replace placeholders in template files:
envsubst < taskdef.template.json > taskdef.json
envsubst < s3_dynamodb_policy.template.json > s3_dynamodb_policy.json
envsubst < ecsTarget.template.json > ecsTarget.json
envsubst < ecseventsrole-policy.template.json > ecseventsrole-policy.json
Step 5: Build and Push Docker Image
- Create an ECR Repository
aws ecr create-repository --repository-name sports-backup
- Login to AWS ECR
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
- Build Docker Image
docker build -t sports-backup .
- Tag and Push the Image
docker tag sports-backup:latest ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/sports-backup:latest
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/sports-backup:latest
Step 6: Create AWS Resources
- Register the ECS Task Definition
aws ecs register-task-definition --cli-input-json file://taskdef.json --region ${AWS_REGION}
- Create CloudWatch Log Group
aws logs create-log-group --log-group-name "${AWS_LOGS_GROUP}" --region ${AWS_REGION}
- Attach S3/DynamoDB Policy to ECS Task Execution Role
aws iam put-role-policy --role-name ecsTaskExecutionRole --policy-name S3DynamoDBAccessPolicy --policy-document file://s3_dynamodb_policy.json
- Create ECS Events Role
aws iam create-role --role-name ecsEventsRole --assume-role-policy-document file://ecsEventsRole-trust.json
- Attach ECS Events Role Policy
aws iam put-role-policy --role-name ecsEventsRole --policy-name ecsEventsPolicy --policy-document file://ecseventsrole-policy.json
Step 7: Schedule the Task with AWS EventBridge
- Create the EventBridge Rule
aws events put-rule --name SportsBackupScheduleRule --schedule-expression "rate(1 day)" --region ${AWS_REGION}
- Add Target to the Rule
aws events put-targets --rule SportsBackupScheduleRule --targets file://ecsTarget.json --region ${AWS_REGION}
Step 8: Manually Test the ECS Task
aws ecs run-task \
--cluster sports-backup-cluster \
--launch-type FARGATE \
--task-definition ${TASK_FAMILY} \
--network-configuration "awsvpcConfiguration={subnets=[\"${SUBNET_ID}\"],securityGroups=[\"${SECURITY_GROUP_ID}\"],assignPublicIp=\"ENABLED\"}" \
--region ${AWS_REGION}
Key Learnings
- Using templated JSON files for automated AWS configurations.
- Storing and backing up sports highlight data in DynamoDB and S3.
- Deploying event-driven ECS tasks with AWS Fargate and EventBridge.
- Monitoring logs and task execution in CloudWatch.
Future Enhancements
- Automated backup of DynamoDB tables to S3.
- Batch processing of JSON files to handle multiple videos per execution.
Conclusion
The SportsDataBackup project demonstrates how AWS services can be combined to automate data ingestion, storage, processing, and scheduling. With a fully automated setup, this solution ensures reliable backup and processing of sports highlights using AWS cloud-native tools.
If you're interested in contributing, feel free to fork the repository and submit pull requests! 🚀
Top comments (0)