DEV Community

Cover image for Building an ETL Pipeline with Python Using CoinGecko API
Mubarak Mohamed
Mubarak Mohamed

Posted on

Building an ETL Pipeline with Python Using CoinGecko API

Extract, Transform, Load (ETL) is a fundamental process in data engineering used to collect data from various sources, process it, and store it in a structured format for analysis. In this tutorial, we will build a simple ETL pipeline using Python and the CoinGecko API to extract cryptocurrency market data, transform it into a structured format, and load it into a SQLite file for further use.

Prerequisites

To follow along, you need to have:

  • Python installed (>=3.7)
  • requests and pandas libraries installed
  • A CoinGecko API key (optional but recommended for higher request limits)

You can install the required libraries using:

pip install requests
Enter fullscreen mode Exit fullscreen mode

Step 1: Extract Data from CoinGecko API

The extraction phase involves fetching cryptocurrency market data from the CoinGecko API.

import requests
import pandas as pd

def extract_data_from_api():
    url = "https://api.coingecko.com/api/v3/coins/markets?vs_currency=usd"

    headers = {
        "accept": "application/json",
        "x-cg-demo-api-key": "YOUR_API_KEY_HERE",  # Replace with your API key
    }

    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        return pd.json_normalize(data)  # Convert JSON response to DataFrame
    else:
        raise Exception("Error fetching data from API")
Enter fullscreen mode Exit fullscreen mode

Step 2: Transform the Data

Transformation is necessary to clean and structure the data before loading it. We'll select relevant columns and rename them for clarity.

def transform_data(df):
    df_transformed = df[["id", "symbol", "name", "current_price", "market_cap", "total_volume", "price_change_percentage_24h", "last_updated"]].copy()
    df_transformed.columns = ["id", "symbol", "name", "price_usd", "market_cap_usd", "volume_24h_usd", "price_change_24h_percent", "date"]
    df_transformed["date"] = pd.to_datetime(df_transformed["date"]).dt.date
    df_transformed = df_transformed.fillna(0)
    return df_transformed
Enter fullscreen mode Exit fullscreen mode

Step 3: Load Data into SQLite Database

The final step is to store the processed data into an SQLite database for further analysis.

import sqlite3

def load_data_to_sqlite(df, db_file, table_name):
    # Connect to the database
    conn = sqlite3.connect(db_file)
    cursor = conn.cursor()

    # Create the table if it does not exist
    cursor.execute(f"""
        CREATE TABLE IF NOT EXISTS {table_name} (
            id TEXT PRIMARY KEY,
            symbol TEXT, 
            name TEXT, 
            price_usd REAL, 
            market_cap_usd REAL, 
            volume_24h_usd REAL, 
            price_change_24h_percent REAL
        )
    """)

    # Load the data into the table
    df.to_sql(table_name, conn, if_exists='replace', index=False)

    # Commit and close the connection
    conn.commit()
    conn.close()
    print(f"Data successfully loaded into table '{table_name}' in database {db_file}")
Enter fullscreen mode Exit fullscreen mode

Step 4: Putting It All Together

Now, we can orchestrate the entire ETL process using a main function.

def etl_pipeline():
    # Extraction 
    df_crypto = extract_data_from_api()
    print("Extraction successful")

    # Transformation
    df_transformed = transform_data(df_crypto)
    print("Transformation successful")

    # Loading
    db_file = "database.db"
    table_name = "crypto_data"
    load_data_to_sqlite(df_transformed, db_file, table_name)
    print("Loading successful")

if __name__ == "__main__":
    etl_pipeline()
Enter fullscreen mode Exit fullscreen mode

This tutorial demonstrated how to build a simple ETL pipeline in Python using the CoinGecko API. We covered extracting data from the API, transforming it into a structured format, and loading it into an SQLite database. This pipeline can be extended to store data in a cloud database, automate execution using cron jobs, or integrate with data visualization tools.

Happy coding!

Top comments (0)