In this technical blog, we will explore the SportsDataBackup project, which automates fetching sports highlights, storing data in Amazon S3 and DynamoDB, processing videos, and running on a schedule using AWS ECS Fargate and EventBridge. This guide will walk you through the setup, configuration, and deployment of this cloud-native automation.
Project Overview
The SportsDataBackup system is designed to:
- Retrieve sports highlights from RapidAPI.
- Store metadata in Amazon DynamoDB.
- Save highlight videos in Amazon S3.
- Process videos using AWS MediaConvert.
- Schedule execution using AWS EventBridge and ECS Fargate.
- Monitor logs via Amazon CloudWatch.
Before we begin, ensure the following dependencies are installed:
1. Create a RapidAPI Account
Register on RapidAPI.
Get your API key to access sports highlights.
2. Install Required Tools
Docker (Pre-installed in most environments)
docker --version
AWS CLI (Pre-installed in AWS CloudShell)
aws --version
python3 --version
- gettext (For environment variable substitution)
Install on Ubuntu/Debian: sudo apt install gettext
Install on macOS (Homebrew): brew install gettext
Install on Windows (Chocolatey): choco install gettext
Also, you can follow the installation step here
3. Retrieve AWS Account ID
- Run the following command:
aws sts get-caller-identity --query "Account" --output text
- Save your AWS Account ID for later.
4. Retrieve AWS Access Keys
- Navigate to IAM Dashboard > Users > Security Credentials.
- Create and save Access Key and Secret Access Key.
Step-by-Step Setup
Step 1: Clone the Repository
git clone
cd SportsDataBackup/src
Step 2: Configure Environment Variables
Modify the .env file with the relevant values:
aws mediaconvert describe-endpoints --query "Endpoints[0].Url" --output text
Steps for getting SubnetID and Security Group ID:
- In the github repo, there is a resources folder and copy the entire contents
- In the AWS Cloudshell or vs code terminal, create the file and paste the script inside.
Run the script
- You will see variables in the output, paste these variables into Subnet_ID and Security_Group_ID
Step 3: Load Environment Variables
set -a
source .env
set +a
Verify the variables:
Step 4: Generate JSON Configuration Files
Use envsubst to replace placeholders in template files:
envsubst < taskdef.template.json > taskdef.json
envsubst < s3_dynamodb_policy.template.json > s3_dynamodb_policy.json
envsubst < ecsTarget.template.json > ecsTarget.json
envsubst < ecseventsrole-policy.template.json > ecseventsrole-policy.json
Step 5: Build and Push Docker Image
- Create an ECR Repository
aws ecr create-repository --repository-name sports-backup
- Login to AWS ECR
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}
- Build Docker Image
docker build -t sports-backup .
- Tag and Push the Image
docker tag sports-backup:latest ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}
Step 6: Create AWS Resources
- Register the ECS Task Definition
aws ecs register-task-definition --cli-input-json file://taskdef.json --region ${AWS_REGION}
- Create CloudWatch Log Group
aws logs create-log-group --log-group-name "${AWS_LOGS_GROUP}" --region ${AWS_REGION}
- Attach S3/DynamoDB Policy to ECS Task Execution Role
aws iam put-role-policy --role-name ecsTaskExecutionRole --policy-name S3DynamoDBAccessPolicy --policy-document file://s3_dynamodb_policy.json
- Create ECS Events Role
aws iam create-role --role-name ecsEventsRole --assume-role-policy-document file://ecsEventsRole-trust.json
- Attach ECS Events Role Policy
aws iam put-role-policy --role-name ecsEventsRole --policy-name ecsEventsPolicy --policy-document file://ecseventsrole-policy.json
Step 7: Schedule the Task with AWS EventBridge
- Create the EventBridge Rule
aws events put-rule --name SportsBackupScheduleRule --schedule-expression "rate(1 day)" --region ${AWS_REGION}
- Add Target to the Rule
aws events put-targets --rule SportsBackupScheduleRule --targets file://ecsTarget.json --region ${AWS_REGION}
Step 8: Manually Test the ECS Task
aws ecs run-task \
--cluster sports-backup-cluster \
--launch-type FARGATE \
--task-definition ${TASK_FAMILY} \
--network-configuration "awsvpcConfiguration={subnets=[\"${SUBNET_ID}\"],securityGroups=[\"${SECURITY_GROUP_ID}\"],assignPublicIp=\"ENABLED\"}" \
--region ${AWS_REGION}
Key Learnings
- Using templated JSON files for automated AWS configurations.
- Storing and backing up sports highlight data in DynamoDB and S3.
- Deploying event-driven ECS tasks with AWS Fargate and EventBridge.
- Monitoring logs and task execution in CloudWatch.
Future Enhancements
- Automated backup of DynamoDB tables to S3.
- Batch processing of JSON files to handle multiple videos per execution.
The SportsDataBackup project demonstrates how AWS services can be combined to automate data ingestion, storage, processing, and scheduling. With a fully automated setup, this solution ensures reliable backup and processing of sports highlights using AWS cloud-native tools.
If you're interested in contributing, feel free to fork the repository and submit pull requests! 🚀
