Pandas is one of the most widely used libraries in Python for data manipulation and analysis. Whether you are working on small datasets or large-scale data, Pandas makes it easy to clean, transform, and analyze data quickly and efficiently. In this blog, I will walk through an example of how I use Pandas to fetch and manipulate data and then transform the raw information into a graph for data visualization.
To start, you don't need to worry about local installation, an easy way to jump into using Pandas is through Google Colab, a cloud-based Jupyter notebook environment that includes Pandas by default.
Importing Necessary Libraries
import requests # importing the requests library in order to do fetch
import pandas as pd # importing the pandas library for data visualization, data manipulation, and analysis
Before working with APIs and Pandas, we need the import the required libraries.
-
requests
: This is a popular Python library used to send HTTP requests. We will be using it to fetch data from an API -
pandas
: Pandas is the library we are showcasing today! It provides powerful tools to clean, transform, and visualize structured datasets.
By importing the two libraries, we will be using them to retrieve data from the Kraken API and process it through a structured Pandas DataFrame.
Creating the fetch function - Kraken API & Pandas
def get_historic_price(symbol, after='2024-12-01'): # fetch function - parameters are the ticker symbol, and date,
url = 'https://api.kraken.com/0/public/OHLC' # api
pair = f"{symbol.upper()}USD"
# XBTUSD when symbol='xbt' - used to dynamically constructor the symbol in uppercase and USD. f stands for f-string, formatted string literal
resp = requests.get(url, params={ # this is the fetch request
"pair": pair, # the pair that was created
'interval': 60, # time interval for data in minutes (60 for 1 hr)
'since': str(int(pd.Timestamp(after).timestamp()))
# pd.Timestamp() - Pandas function that converts a date into timestamp object, after is a string -
# .timestamp() - Pandas function that converts a Timestamp to UNIX format, which is a float
# 1733011200.0 (This is the UNIX timestamp for December 1, 2024, 00:00:00 UTC.)
# int() - Python function that converts data type to int
# str() - Python function that converts datattype to string
})
resp.raise_for_status() # error if response is a failure
data = resp.json() # convert JSON response
results_key = [k for k in data['result'].keys() if k != 'last'][0] # iterate over the data.result to get the keys and filter out "last"
results = [
(close_time, float(open), float(high), float(low), float(close), float(volume)) # formatting
for (close_time, open, high, low, close, vwap, volume, count) # looping over each row
in data['result'][results_key] # source of data
]
# Before
# [1680105600, '27000.0', '27500.0', '26000.0', '26500.0', '26750.0', '120.5', 300]
#
# After
# (1680105600, 27000.0, 27500.0, 26000.0, 26500.0, 120.5)
df = pd.DataFrame(results, columns=[ # Pandas function - creating a DataFrame with params of result (list of tuples) & assign column names
'CloseTime', 'OpenPrice', 'HighPrice', 'LowPrice', 'ClosePrice', 'Volume'
])
df['CloseTime'] = pd.to_datetime(df['CloseTime'], unit='s') # Pandas function to convert from UNIX back to readable time format - on CloseTime
df.set_index('CloseTime', inplace=True) # setting the index of DataFrame as CloseTime
return df # return DataFrame
To demonstrate Panda's integration with external data sources. let's explore a function that fetches historical cryptocurrency prices from the Kraken exchange API.
The get_historic_price
function retrieves hourly price data with the parameters of the cryptocurrency symbol and the date.
How The Function Works
The API Request: A call is made to Kraken's OHLC(Open-High-Low-Close) API. The
get_historic_price
params are manipulated to be the options/params of the request.Data Processing: The response to the request has the relevant price data extracted and converted into floats.
Create DataFrame: The organized data is transformed into a DataFrame through the method of
.DataFrame()
The fieldCloseTime
is converted and set and the DataFrame is returned.
Creating Fetch Parameters
Now that the fetch function is created, it requires the parameters of the symbol and data.
When working with time-series data, you may need to fetch data from a specific time range. Pandas provide convenient functions to manipulate dates and times.
last_week = (pd.Timestamp.now() - pd.offsets.Day(7)) # Pandas function to get the current date and time - 7 days
Pandas methods
-
.Timestamp.now()
- Gets the current date and time. -
.offsets.Days(7)
- Represents an offset of 7 days
The variable last_week
reflects the current date minus seven days, making it exactly one week ago.
Calling the fetch
After the last_week
variable is set, we can call the fetch on the historical Bitcoin (BTC). The returned data is formatted into a Pandas DataFrame, making it easier for future analysis and visualization.
btc = get_historic_price('btc', after=last_week) # running fetch
Data Inspection & Exploration
Before diving into data analysis, it's crucial to inspect and explore the dataset. Pandas provide powerful functions to quickly understand the structure and contents of a data frame.
Viewing the First Few Rows
btc.head() # Displays the first few rows of the DataFrame
-
btc
is the variable storing the DataFrame. -
.head()
is a Pandas function that returns the first five rows by default, providing a quick snapshot of the dataset.
Why Use It?
✅ Ensures the data has loaded correctly.
✅ Provides an overview of the column names and structure.
✅ Helps preview data without printing the entire DataFrame.
Getting a Summary of the Data
btc.info() # Displays a concise summary of the DataFrame
-
.info()
is a Pandas function that provides a high-level overview of the dataset, including column names, non-null counts, and data types.
Why Use It?
✅ Understands the dataset’s structure and column types.
✅ Identifies missing values that may need handling.
✅ Confirms data types for proper analysis and visualization.
By using these simple Pandas functions, you can quickly inspect your dataset and prepare it for deeper analysis. Stay tuned as we dive into data cleaning and transformation in the next sections!
Data Visualization
Lastly, we can visualize how Bitcoin's closing price has changed over time through data visualization.
btc['ClosePrice'].plot(figsize=(15, 7)) # Pandas function to plot the ClosePrice column of the btc Data Frame. FigSize refers to size of figure in inches, 15 width, 7 height
-
.plot()
is a Pandas function that generates a line plot for a DataFrame column. -
btc['ClosePrice'] selects the
ClosePrice` column, which contains the Bitcoin's closing prices over time. -
figsize(15,7)
specifies the figure size in inches (15 inches wide, and 7 inches tall)
Why Use It?
✅ Quickly see trends, spikes, or dips in Bitcoin prices.
✅ Help analyze historical price movements.
Conclusion
Pandas is an incredibly powerful tool for data manipulation, making it easy to fetch, clean, and visualize data with just a few lines of code. In this blog, we explored how to:
- Retrieve historical cryptocurrency prices from the Kraken API
- Transform raw JSON data into a structured Pandas DataFrame
- Inspect and analyze datasets using built-in Pandas functions
- Create a simple visualization to track price trends over time
By leveraging Pandas, you can efficiently work with large datasets and gain insights quickly. Whether you're analyzing financial data, working with APIs, or building data-driven applications, Pandas provides the flexibility and ease of use needed to streamline your workflow.
If you’re interested in further exploration, try experimenting with different cryptocurrencies, timeframes, or visualization techniques. Data analysis is all about iteration and discovery—keep exploring!
Top comments (0)