Ameh Mathias Ejeh

Posted on Jan 23

Building an NBA Sport Data Lake Analytic using AWS Services

#devops #programming #aws #cloud

Overview

The NBA Sport Data Lake Analytic project is a cloud-native solution that builds a scalable data lake for NBA analytics. By leveraging AWS services, this project automates data ingestion, cataloging, and querying, enabling efficient storage and analysis of NBA-related data.

Architecture

The architecture of the project is designed to process and analyze NBA data efficiently. The main components are:

Amazon S3: Stores raw and processed data.
AWS Glue: Automates data cataloging and schema creation.
Amazon Athena: Enables SQL querying of the data stored in S3.

Architecture Diagram

Workflow

Data Ingestion: Fetch data from SportsData.io's NBA API.
Data Storage: Store the raw data in Amazon S3.
Data Cataloging: Use AWS Glue to create a database and table schema.
Data Querying: Query the data using Amazon Athena for analytics.

Prerequisites

Required Accounts and Tools

SportsData.io API Key: Sign up at SportsData.io to get access to the NBA API.
AWS Account: An active AWS account with permissions to use S3, Glue, and Athena.
Python Environment: Python 2.31.0 installed locally. A virtual environment for dependency management.

Permissions

Ensure the IAM user or role has the following AWS permissions:

S3: s3:CreateBucket, s3:PutObject, s3:DeleteBucket, s3:ListBucket
Glue: glue:CreateDatabase, glue:CreateTable, glue:DeleteDatabase, glue:DeleteTable
Athena: athena:StartQueryExecution, athena:GetQueryResults

Setup Guide

Step 1: Clone the Repository

git clone https://github.com/ameh0429/ameh0429-NBA-Sport-Data-Lake-Analytic.git
cd ameh0429-NBA-Sport-Data-Lake-Analytic

Step 2: Install Dependencies

Create and activate a virtual environment:

pip install -r requirements.txt

Step 3: Configure Environment Variables

Create a .env file with your API key and endpoint:

echo "SPORTS_DATA_API_KEY=your_api_key" >> .env
echo "NBA_ENDPOINT=https://api.sportsdata.io/v3/nba/scores/json/Players" >> .env

Step 4: Run the Data Lake Setup Script

In the CLI terminal, paste the setup_nba_data_lake.py script

Run the script

python setup_nba_data_lake.py

The script performs the following actions:

Creates an S3 bucket named sports-analytics-data-lake-0429.
Uploads NBA player data to the raw-data folder.
Configures a Glue database and table.
Sets up Athena for querying

Step 5: Validate Setup

S3: Verify the bucket and data file in the AWS Management Console.

Athena: Run a test query:

Query 1

SELECT FirstName, LastName, Position, Team
FROM nba_players
WHERE Position = 'PG';

The output

Query 2

SELECT PlayerID, FirstName, LastName, Team, Position
FROM nba_players
WHERE Team = 'LAL';

The output

Cleanup

To delete all the resources created by the project, run the cleanup script:

python delete_resources.py

This will:

Remove the S3 bucket and its contents.
Delete the Glue database and table.
Clean up Athena configurations.

DEV Community

Building an NBA Sport Data Lake Analytic using AWS Services

Overview

Architecture

Architecture Diagram

Workflow

Prerequisites

Required Accounts and Tools

Permissions

Setup Guide

Cleanup

Top comments (0)

Read next

A Guide To Providing Shared File Storage For Offices In Azure

How i built pac-man game using Grok 3 Ai | Can you believe it?

Grok-3: A Paradigm Shift in AI-Driven Software Development

CSS Layouts Painful to learn? Make it EASY with this guide