DEV Community

Andii Zhu
Andii Zhu

Posted on

Learning Pandas, A Powerful Library For Data Visualization, Data manipulation, and Analysis

Pandas is one of the most widely used libraries in Python for data manipulation and analysis. Whether you are working on small datasets or large-scale data, Pandas makes it easy to clean, transform, and analyze data quickly and efficiently. In this blog, I will walk through an example of how I use Pandas to fetch and manipulate data and then transform the raw information into a graph for data visualization.

To start, you don't need to worry about local installation, an easy way to jump into using Pandas is through Google Colab, a cloud-based Jupyter notebook environment that includes Pandas by default.

Importing Necessary Libraries

import requests # importing the requests library in order to do fetch
import pandas as pd # importing the pandas library for data visualization, data manipulation, and analysis
Enter fullscreen mode Exit fullscreen mode

Before working with APIs and Pandas, we need the import the required libraries.

  1. requests: This is a popular Python library used to send HTTP requests. We will be using it to fetch data from an API
  2. pandas: Pandas is the library we are showcasing today! It provides powerful tools to clean, transform, and visualize structured datasets.

By importing the two libraries, we will be using them to retrieve data from the Kraken API and process it through a structured Pandas DataFrame.

Creating the fetch function - Kraken API & Pandas

def get_historic_price(symbol, after='2024-12-01'): # fetch function - parameters are the ticker symbol, and date,

    url = 'https://api.kraken.com/0/public/OHLC' # api
    pair = f"{symbol.upper()}USD"
    # XBTUSD when symbol='xbt' - used to dynamically constructor the symbol in uppercase and USD. f stands for f-string, formatted string literal

    resp = requests.get(url, params={ # this is the fetch request
        "pair": pair, # the pair that was created
        'interval': 60, # time interval for data in minutes (60 for 1 hr)
        'since': str(int(pd.Timestamp(after).timestamp()))
        # pd.Timestamp() - Pandas function that converts a date into timestamp object, after is a string -
        # .timestamp() - Pandas function that converts a Timestamp to UNIX format, which is a float
        # 1733011200.0 (This is the UNIX timestamp for December 1, 2024, 00:00:00 UTC.)
        # int() - Python function that converts data type to int
        # str() - Python function that converts datattype to string
    })
    resp.raise_for_status() # error if response is a failure

    data = resp.json() # convert JSON response

    results_key = [k for k in data['result'].keys() if k != 'last'][0] # iterate over the data.result to get the keys and filter out "last"
    results = [
        (close_time, float(open), float(high), float(low), float(close), float(volume)) # formatting
        for (close_time, open, high, low, close, vwap, volume, count) # looping over each row
        in data['result'][results_key] # source of data
    ]
    # Before
    # [1680105600, '27000.0', '27500.0', '26000.0', '26500.0', '26750.0', '120.5', 300]
    #
    # After
    # (1680105600, 27000.0, 27500.0, 26000.0, 26500.0, 120.5)

    df = pd.DataFrame(results, columns=[ # Pandas function - creating a DataFrame with params of result (list of tuples) & assign column names
        'CloseTime', 'OpenPrice', 'HighPrice', 'LowPrice', 'ClosePrice', 'Volume'
    ])
    df['CloseTime'] = pd.to_datetime(df['CloseTime'], unit='s') # Pandas function to convert from UNIX back to readable time format - on CloseTime
    df.set_index('CloseTime', inplace=True) # setting the index of DataFrame as CloseTime
    return df # return DataFrame
Enter fullscreen mode Exit fullscreen mode

To demonstrate Panda's integration with external data sources. let's explore a function that fetches historical cryptocurrency prices from the Kraken exchange API.

The get_historic_price function retrieves hourly price data with the parameters of the cryptocurrency symbol and the date.

How The Function Works

  1. The API Request: A call is made to Kraken's OHLC(Open-High-Low-Close) API. The get_historic_price params are manipulated to be the options/params of the request.

  2. Data Processing: The response to the request has the relevant price data extracted and converted into floats.

  3. Create DataFrame: The organized data is transformed into a DataFrame through the method of .DataFrame()The field CloseTime is converted and set and the DataFrame is returned.

Creating Fetch Parameters

Now that the fetch function is created, it requires the parameters of the symbol and data.

When working with time-series data, you may need to fetch data from a specific time range. Pandas provide convenient functions to manipulate dates and times.

last_week = (pd.Timestamp.now() - pd.offsets.Day(7)) # Pandas function to get the current date and time - 7 days
Enter fullscreen mode Exit fullscreen mode

Pandas methods

  1. .Timestamp.now() - Gets the current date and time.
  2. .offsets.Days(7) - Represents an offset of 7 days

The variable last_week reflects the current date minus seven days, making it exactly one week ago.

Calling the fetch

After the last_week variable is set, we can call the fetch on the historical Bitcoin (BTC). The returned data is formatted into a Pandas DataFrame, making it easier for future analysis and visualization.

btc = get_historic_price('btc', after=last_week) # running fetch
Enter fullscreen mode Exit fullscreen mode

Data Inspection & Exploration

Before diving into data analysis, it's crucial to inspect and explore the dataset. Pandas provide powerful functions to quickly understand the structure and contents of a data frame.

Viewing the First Few Rows

btc.head()  # Displays the first few rows of the DataFrame
Enter fullscreen mode Exit fullscreen mode

Image description

  1. btc is the variable storing the DataFrame.
  2. .head() is a Pandas function that returns the first five rows by default, providing a quick snapshot of the dataset.

Why Use It?

✅ Ensures the data has loaded correctly.
✅ Provides an overview of the column names and structure.
✅ Helps preview data without printing the entire DataFrame.

Getting a Summary of the Data

btc.info()  # Displays a concise summary of the DataFrame
Enter fullscreen mode Exit fullscreen mode

Image description

  1. .info() is a Pandas function that provides a high-level overview of the dataset, including column names, non-null counts, and data types.

Why Use It?

✅ Understands the dataset’s structure and column types.
✅ Identifies missing values that may need handling.
✅ Confirms data types for proper analysis and visualization.

By using these simple Pandas functions, you can quickly inspect your dataset and prepare it for deeper analysis. Stay tuned as we dive into data cleaning and transformation in the next sections!

Data Visualization

Lastly, we can visualize how Bitcoin's closing price has changed over time through data visualization.

btc['ClosePrice'].plot(figsize=(15, 7)) # Pandas function to plot the ClosePrice column of the btc Data Frame. FigSize refers to size of figure in inches, 15 width, 7 height
Enter fullscreen mode Exit fullscreen mode

Image description

  1. .plot() is a Pandas function that generates a line plot for a DataFrame column.
  2. btc['ClosePrice'] selects theClosePrice` column, which contains the Bitcoin's closing prices over time.
  3. figsize(15,7) specifies the figure size in inches (15 inches wide, and 7 inches tall)

Why Use It?

✅ Quickly see trends, spikes, or dips in Bitcoin prices.
✅ Help analyze historical price movements.

Conclusion

Pandas is an incredibly powerful tool for data manipulation, making it easy to fetch, clean, and visualize data with just a few lines of code. In this blog, we explored how to:

  • Retrieve historical cryptocurrency prices from the Kraken API
  • Transform raw JSON data into a structured Pandas DataFrame
  • Inspect and analyze datasets using built-in Pandas functions
  • Create a simple visualization to track price trends over time

By leveraging Pandas, you can efficiently work with large datasets and gain insights quickly. Whether you're analyzing financial data, working with APIs, or building data-driven applications, Pandas provides the flexibility and ease of use needed to streamline your workflow.

If you’re interested in further exploration, try experimenting with different cryptocurrencies, timeframes, or visualization techniques. Data analysis is all about iteration and discovery—keep exploring!

Top comments (0)