Rene Francisco Cruz González

Posted on Jan 17

Building a Weather Data Analytics Pipeline with AWS and OpenWeatherMap API

#aws #devops #terraform #python

Hey coders! In this blog post, I will walk you through the process of building a weather data analytics pipeline using the OpenWeatherMap API and various AWS services. This project involves fetching weather data, storing it in an S3 bucket, cataloging it using AWS Glue, and querying it with Amazon Athena.

Project Overview

The goal of this project is to create a scalable and efficient data pipeline that can fetch weather data for multiple cities, store it in AWS S3, catalog the data using AWS Glue, and perform queries using Amazon Athena.

Initial architecture design

Architecture

Project Structure

Prerequisites

Before you begin, ensure you have the following:

Docker: Installed on your machine.
AWS Account: With permissions to create S3 buckets, Glue databases, and Glue crawlers.
OpenWeatherMap API Key: Obtain an API key from OpenWeatherMap.

Setup Instructions

Step 1: Clone the Repository

First, clone the repository and navigate to the project directory:

git clone https://github.com/Rene-Mayhrem/weather-insights.git
cd weather-data-analytics

Step 2: Create a .env File

Create a .env file in the root directory with your AWS credentials and OpenWeatherMap API key:

AWS_ACCESS_KEY_ID=<your-access-key-id>
AWS_SECRET_ACCESS_KEY=<your-secret-access-key>
AWS_REGION=us-east-1
S3_BUCKET_NAME=<your-s3-bucket-name>
OPENWEATHER_API_KEY=<your-openweather-api-key>

Step 3: Create a cities.json File

Create a cities.json file in the root directory with a list of cities to study:

{
  "cities": [
    "London",
    "New York",
    "Tokyo",
    "Paris",
    "Berlin"
  ]
}

Step 4: Using Docker Compose

Build and run the services using Docker Compose:

docker compose run terraform init
docker compose run python

Usage

After running the Docker containers, follow these steps:

Verify Infrastructure Setup

Ensure that Terraform has successfully created the necessary AWS resources (S3 bucket, Glue database, and Glue crawler). You can verify this in the AWS Management Console.

Verify Data Upload

Check that the Python script has fetched weather data for the specified cities and uploaded the data to the S3 bucket. Verify the JSON files in the S3 bucket via the AWS Management Console.

Run the Glue Crawler

The Glue crawler should automatically run if set up correctly. This will catalog the data in the S3 bucket. Verify the crawler's run and data cataloging in the Glue console.

Query Data with Athena

Use Amazon Athena to query the data cataloged by Glue. Access Athena through the AWS Management Console and run SQL queries on the data.

Key Interactions

Docker: Provides two consistent environments for running scripts and Iac (Python and terraform).
Terraform: Provisions and manage AWS resources (S3, AWS Glue, AWS Athena, etc).
Python: Fetch weather data and create JSON files to upload them to the S3 bucket created with terraform.
Glue: Crawls S3 data to catalog data and create schemas.
Athena: Queries data for insights.

Conclusion

By following these steps, you can set up a robust weather data analytics pipeline using AWS services and the OpenWeatherMap API. This pipeline can be extended to include more cities or additional data sources as needed.

DEV Community