Introduction
Managing and processing sports data efficiently requires a robust and scalable infrastructure. This project integrates Amazon DynamoDB to store game data fetched from RapidAPI, ensuring persistent storage, scalability, and efficient querying for processing and analytics. Additionally, we leverage AWS Systems Manager (SSM) Parameter Store to securely manage API keys, enhancing security and maintainability. Furthermore, AWS MediaConvert is utilized for transcoding game highlight videos stored in S3 Bucket, ensuring optimal video quality and compatibility across multiple platforms.
Solution Architecture
RapidAPI fetches NCAA game data:
• stores the raw json sports data in s3 bucket
• stores structured sports data for quick retrieval and long-term storage in Amazon DynamoDB.Containerized Processing with ECS:
• ECS (Fargate) hosts a containerized service that processes game data.
• retrieves secured API keys from AWS Systems Manager Parameter Store
• AWS ECR stores and manages Docker imagesVideo Transcoding with AWS MediaConvert:
• video clips quality is enhanced and the output save back in S3 using AWS MediaConvert.Monitoring and Alerting with CloudWatch:
• cloudWatch Logs capture application and infrastructure logs.
• cloudWatch Alarms notify on failures or performance degradation.
Prerequisites
Before we get started, ensure you have the following:
- RapidAPI Account: Sign up for a RapidAPI account and subscribe to an API that provides NCAA game highlights.
- AWS Account: AWS access with the required permission to access the necessary services.
- Docker Installed: Install Docker on your system to run the containerized workflow.
- Terraform Installed: Ensure Terraform is installed for infrastructure deployment.
- Basic CLI Knowledge: Familiarity with using command-line tools for API requests, AWS configurations, and Terraform commands.
Tech Stack
- Python
- AWS ECR, ECS, DynamoDB, & Elemental MediaConvert
- Docker
- Terraform
Step 1: Clone the project Repository
Use the git clone
command to clone the project repository to your local machine.
https://github.com/OjoOluwagbenga700/sports-data-backup.git
Ensure that you have Git installed and configured on your system.
Navigate to Repository: cd sports-data-backup
Step 2: Reviewing the content of the code
let's take a closer look at the file structure and its contents.
app folder: Contains main application configuration files, with each Python script serving a specific function.
config.py: Loads environment variables, assigning defaults where needed.
fetch.py: Retrieves highlights from the API and stores them as a JSON file in S3 and also as structured data in DynamoDB
process_videos.py: Extracts videos URLs from JSON data file in S3, download them , and save them to S3 under the videos/ folder. Logs each step.
mediaconvert_process.py: Processes videos using MediaConvert, configuring codec, resolution, bitrate, and audio settings before storing them back in S3.
run_all.py: Executes scripts sequentially, ensuring proper task execution with buffer time.
Dockerfile: Provides the step by step approach to build the docker image.
container_definitions.tpl: A template file for defining ECS container definitions in Terraform.
Terraform Infrastructure Files: These .tf files define the AWS infrastructure using Terraform.
- dynamodb.tf: Configures the Amazon DynamoDB table to store sports data.
- ecr.tf: Defines the AWS Elastic Container Registry (ECR) for storing Docker images.
- ecs.tf: Configures AWS Elastic Container Service (ECS) to run the application.
- iam.tf: Defines IAM roles and permissions required for AWS services.
- networking.tf: Sets up networking components (VPC, subnets, security groups).
- provider.tf: Specifies the AWS provider configuration for Terraform.
- s3.tf: Manages S3 buckets for storing raw and processed videos.
- scheduler.tf: Defines scheduled tasks (potentially using CloudWatch events or EventBridge scheduler).
- ssm.tf: Configures AWS Systems Manager (SSM) Parameter Store for secret management.
- terraform.tfvars: Contains variable values used in Terraform.
- variable.tf: Declares input variables for Terraform configurations.
Step 3: Running Terraform Commands
Initialize Terraform: Run terraform init
to initialize the Terraform working directory and download the necessary plugins.
Plan the Deployment: Run terraform plan
to preview the resources that Terraform will create.
Apply the Configuration: Run terraform apply –auto-approve to deploy the infrastructure.
Step 4: Verify Application deployment by confirming resources deployed on AWS
SSM Parameter Store
EventBridge Scheduler
S3 Bucket
AWS ECR
AWS ECS
AWS Elemental MediaConvert
DynamoDB
AWS Cloudwatch Logs
Conclusion
This solution enhances reliability by fetching data from RapidAPI and storing it in Amazon DynamoDB, improving processing with ECS, refining video workflows using MediaConvert, and adding robust monitoring with CloudWatch. Additionally, AWS SSM Parameter Store ensures secure management of API keys and secrets, reducing exposure risks and improving security. By using Terraform, we ensure a repeatable and scalable infrastructure.
Top comments (0)