DEV Community

Jian
Jian

Posted on

#012 | Extract, Transform & Load FX rates: Build

Overview

I wrote a script in Flask Python that runs a simple Extract, Transform & Load ("ETL") process on foreign exchange rate ("FX rate") data from yfinance.

To recap, yfinance is a Python library that sources FX rates from Yahoo Finance.

Background to FX rates

We first understand how FX rates are quoted and their relationships to each other. This influences the data transformation and storage logics.

An FX rate is conventionally quoted in the following format: [Base Currency]/[Target Currency]. This shows how much of the Target Currency one unit of the Base Currency buys.

Using the theory of Cross Currency Triangulation, we don't need to download and store every currency permutation. All we need are the FX rates for one Base Currency from yfinance, and other currency permutations can be indirectly calculated (if needed).

For example, assume we are given these quotes where the Malaysian Ringgit is the base currency:

  1. MYR/USD=0.23 means one Malaysian Ringgit buys 0.23 US Dollars
  2. MYR/SGD=0.30 means one Malaysian Ringgit buys 0.30 Singapore Dollars

We can indirectly calculate the USD/SGD rate (or how many Singapore Dollars one US Dollar buys) using the previously mentioned rates. If we are also given the MYR/HKD rate, we can calculate USD/HKD or SGD/HKD rates.

cross currency rates

Database Table Setup

I set up two tables called reference_currencies and yfinance_fx_rates in Xano.

The former stores IBAN currency codes for 179 global currencies. The latter stores yfinance-sourced historical month-end FX rates for a list of selected Target Currencies relative to one Base Currency, the Malaysian Ringgit.

ETL Process: Extract

I used yfinance's download() function to extract the data, specifying the following parameters:

parameters

function

The raw, extracted dataframe of the MYR/USD and MYR/SGD quotes from November and December 2024 look like this:

raw_data

Some characteristics:

  1. Date is returned as a dataframe Index, not column
  2. The dataframe's columns are "Adj Close", "Close", "High", "Low", "Open" and "Volume"
  3. yfinance's FX rate convention is [Target Currency][Base Currency].

We transform the raw data to get the MVP's desired columns:

  1. date: In "yyyymmdd" format
  2. base_currency: 3-digit IBAN currency code
  3. target_currency: 3-digit IBAN currency code
  4. fx_rate: Extracted from yfinance

ETL Process: Transform

I first dropped irrelevant columns and filtered for the last FX rate quote of each month.

drop

I then used pandas' melt() function to rename and reshape the dataframe. I also converted the Date index into a column.

reshape

I split each currency pair into two columns: target_currency & base_currency. I anticipate this makes the data more flexible for data manipulation.

split

This is the final dataframe, after re-ordering and dropping columns.

final

ETL Process: Load

To load the data into Xano, I created an API endpoint that accepts a JSON of the final dataframe. This process is quite straightforward and closely follow the steps in my earlier post.

The API endpoint has the following business logic: Based on the JSON input, retrieve the Primary Key of the Target Currency & Base Currency (from reference_currencies). If found, store the results in the yfinance_fx_rates table. I added a check that skips duplicate records.

xano_reference

xano_fx

Next Steps

We do basic User Acceptance Testing ("UAT") to evaluate the success of the ETL process, before proceeding to the next phase of the MVP

--Ends

Top comments (0)