Building an ETL Pipeline with Python Using CoinGecko API

#data #etl #cryptocurrency #programming

Extract, Transform, Load (ETL) is a fundamental process in data engineering used to collect data from various sources, process it, and store it in a structured format for analysis. In this tutorial, we will build a simple ETL pipeline using Python and the CoinGecko API to extract cryptocurrency market data, transform it into a structured format, and load it into a SQLite file for further use.

Prerequisites

To follow along, you need to have:

Python installed (>=3.7)
requests and pandas libraries installed
A CoinGecko API key (optional but recommended for higher request limits)

You can install the required libraries using:

pip install requests

Step 1: Extract Data from CoinGecko API

The extraction phase involves fetching cryptocurrency market data from the CoinGecko API.

import requests
import pandas as pd

def extract_data_from_api():
    url = "https://api.coingecko.com/api/v3/coins/markets?vs_currency=usd"

    headers = {
        "accept": "application/json",
        "x-cg-demo-api-key": "YOUR_API_KEY_HERE",  # Replace with your API key
    }

    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return pd.json_normalize(data)  # Convert JSON response to DataFrame
    else:
        raise Exception("Error fetching data from API")

Step 2: Transform the Data

Transformation is necessary to clean and structure the data before loading it. We'll select relevant columns and rename them for clarity.

def transform_data(df):
    df_transformed = df[["id", "symbol", "name", "current_price", "market_cap", "total_volume", "price_change_percentage_24h", "last_updated"]].copy()
    df_transformed.columns = ["id", "symbol", "name", "price_usd", "market_cap_usd", "volume_24h_usd", "price_change_24h_percent", "date"]
    df_transformed["date"] = pd.to_datetime(df_transformed["date"]).dt.date
    df_transformed = df_transformed.fillna(0)
    return df_transformed

Step 3: Load Data into SQLite Database

The final step is to store the processed data into an SQLite database for further analysis.

import sqlite3

def load_data_to_sqlite(df, db_file, table_name):
    # Connect to the database
    conn = sqlite3.connect(db_file)
    cursor = conn.cursor()

    # Create the table if it does not exist
    cursor.execute(f"""
        CREATE TABLE IF NOT EXISTS {table_name} (
            id TEXT PRIMARY KEY,
            symbol TEXT, 
            name TEXT, 
            price_usd REAL, 
            market_cap_usd REAL, 
            volume_24h_usd REAL, 
            price_change_24h_percent REAL
        )
    """)

    # Load the data into the table
    df.to_sql(table_name, conn, if_exists='replace', index=False)

    # Commit and close the connection
    conn.commit()
    conn.close()
    print(f"Data successfully loaded into table '{table_name}' in database {db_file}")

Step 4: Putting It All Together

Now, we can orchestrate the entire ETL process using a main function.

def etl_pipeline():
    # Extraction 
    df_crypto = extract_data_from_api()
    print("Extraction successful")

    # Transformation
    df_transformed = transform_data(df_crypto)
    print("Transformation successful")

    # Loading
    db_file = "database.db"
    table_name = "crypto_data"
    load_data_to_sqlite(df_transformed, db_file, table_name)
    print("Loading successful")

if __name__ == "__main__":
    etl_pipeline()

This tutorial demonstrated how to build a simple ETL pipeline in Python using the CoinGecko API. We covered extracting data from the API, transforming it into a structured format, and loading it into an SQLite database. This pipeline can be extended to store data in a cloud database, automate execution using cron jobs, or integrate with data visualization tools.

Happy coding!

DEV Community

Building an ETL Pipeline with Python Using CoinGecko API

Prerequisites

Step 1: Extract Data from CoinGecko API

Step 2: Transform the Data

Step 3: Load Data into SQLite Database

Step 4: Putting It All Together

Top comments (0)

Read next

Understanding Objects in JavaScript

Developer diary #25. Communication

Get Started with Visual Studio Code: The Perfect Editor for Beginners!

1800. Maximum Ascending Subarray Sum