DEV Community

Cover image for Visualizing Temperature Variation; A Climate Spiral .
Data Stories
Data Stories

Posted on • Edited on

Visualizing Temperature Variation; A Climate Spiral .

Introduction

A photo depicting crossword tiles with the words
Photo by Brett Jordan on Unsplash

Welcome to the first article of my 52-week blog challenge. I will be covering technical and description articles in the field of data science and artificial intelligence.

Let's jump right into the definitions first.

  • Temperature - is a physical quantity that expresses the perception of hotness and coldness. In other words, the measure of hotness and coldness is expressed in terms of scales.

  • Variation - is the extent something is different from another

So....

  • Temperature variation is the measure of the difference in temperature in a specific area at a particular range of time.

Image showing variation in temperature across the world

Goals

The goal of this project is to create an animated spiral of Kenya's variation in temperature from 1991 to 2016.

By the end of this blog post you will have learned:

  • Exploratory data analysis - ETL( Extraction, Transformation and Loading data)

  • Data Visualization

  • Generation of a GIF

  • Reporting and presenting the data's story after transforming it from data to information and insights.

Why?

  1. Descriptive analysis- It will describe the current situation on the ground.

  2. Informed decision making-The insight will help with making informed decisions in climate policy-making.

  3. Disaster preparedness-The visualization can help show early signs of unusual temperature spikes that could help prepare better for them.

Background

Ed Hawkins, a climate scientist, unveiled an animated visualization in 2017 that captivated the world. This visualization showed the deviations in the global average temperature from 1850 to 2017. It was re-shared millions of times over Twitter and Facebook and a version of it was even shown at the opening ceremony for the Rio Olympics.

This animation is created with the help of https://www.dataquest.io/blog/climate-temperature-spirals-python/ written by Srini Kadamati.

Historical weather data was retrieved from africa open data https://africaopendata.org/dataset/kenya-climate-data-1991-2016

The data was collected for the climate knowledge portal by the World Bank.

Building the spiral visualization.

1. ETL( Extraction, Transformation and Loading data)

#importing libraries we'll use 
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import matplotlib.animation as animation


Enter fullscreen mode Exit fullscreen mode
#reading the temperature file into a pandas dataframe
temp_data = pd.read_csv(
    "temp data.csv",
    delim_whitespace=True,
    usecols=[0, 1],
    header=None)
Enter fullscreen mode Exit fullscreen mode

Let's take a quick look at the data frame and some properties of the data.

temp_data
Enter fullscreen mode Exit fullscreen mode

Result:


0   1
0   Year,Month  Average,Temperature
1   1991,Jan    Average,25.1631
2   1991,Feb    Average,26.0839
3   1991,Mar    Average,26.2236
4   1991,Apr    Average,25.5812
... ... ...
308 2016,Aug    Average,24.0942
309 2016,Sep    Average,24.437
310 2016,Oct    Average,26.0317
311 2016,Nov    Average,25.5692
312 2016,Dec    Average,25.7401
Enter fullscreen mode Exit fullscreen mode
temp_data.describe()
Enter fullscreen mode Exit fullscreen mode

Result:


0   1
count   313 313
unique  313 313
top Year,Month  Average,Temperature
freq    1   1
Enter fullscreen mode Exit fullscreen mode

From the results you get, check if there is a need to make it more readable.

With this particular case, you need to separate year, month, and average temperature.

temp_data[['Year', 'Month']] = temp_data['Year'].str.split(',', expand=True)

temp_data[['Average', 'Temparature']] = temp_data['Average'].str.split(',', expand=True)
temp_data.head()


Enter fullscreen mode Exit fullscreen mode

Result:


0   1   Year    Month   Average Temperature Temparature
0   Year,Month  Average,Temperature Year    Month   Average Average,Temperature Temperature
1   1991,Jan    Average,25.1631 1991    Jan Average Average,25.1631 25.1631
2   1991,Feb    Average,26.0839 1991    Feb Average Average,26.0839 26.0839
3   1991,Mar    Average,26.2236 1991    Mar Average Average,26.2236 26.2236
4   1991,Apr    Average,25.5812 1991    Apr Average Average,25.5812 25.5812
Enter fullscreen mode Exit fullscreen mode

It is best practice to drop the columns that are repetitive.

temp_data_1 = temp_data.drop(temp_data.columns[[0, 1, 4, 5]], axis=1)
temp_data_1

Enter fullscreen mode Exit fullscreen mode

Result:

Year    Month   Temparature
0   Year    Month   Temperature
1   1991    Jan 25.1631
2   1991    Feb 26.0839
3   1991    Mar 26.2236
4   1991    Apr 25.5812
... ... ... ...
308 2016    Aug 24.0942
309 2016    Sep 24.437
310 2016    Oct 26.0317
311 2016    Nov 25.5692
312 2016    Dec 25.7401
Enter fullscreen mode Exit fullscreen mode

Now let's get to know the data types in the data.

#getting to know what data types my data frame has
temp_data_2.dtypes
Enter fullscreen mode Exit fullscreen mode

Result:

Year           object
Month          object
Temparature    object
dtype: object
Enter fullscreen mode Exit fullscreen mode

All the data is in object form
You need to convert the temperature column data type from object to float. This is because it is the only way you can perform mathematical operations on it and visualize it on a scale.

temp_data_2['Temparature'] = temp_data_2['Temparature'].astype(str).astype(float)

#view data types of each column
temp_data_2.dtypes
Enter fullscreen mode Exit fullscreen mode

Result

Year            object
Month           object
Temparature    float64
dtype: object
Enter fullscreen mode Exit fullscreen mode

Now you will write a function that converts month names to numbers. Here you utilize the datetime python library.

# Define a function to convert month names to numbers
def month_string_to_number(string):
    dt = datetime.strptime(string, "%b")
    return dt.month
## Apply the function to the month column to convert to numbers
temp_data_2['month_number'] = temp_data_2['Month'].apply(month_string_to_number)

temp_data_2.head(20)
Enter fullscreen mode Exit fullscreen mode

Result:

    Year    Month   Temparature month_number
1   1991.0  Jan 25.1631 1
2   1991.0  Feb 26.0839 2
3   1991.0  Mar 26.2236 3
4   1991.0  Apr 25.5812 4
5   1991.0  May 24.6618 5
6   1991.0  Jun 23.9439 6
7   1991.0  Jul 22.9982 7
8   1991.0  Aug 23.0391 8
9   1991.0  Sep 23.9423 9
10  1991.0  Oct 25.5236 10
11  1991.0  Nov 24.5875 11
12  1991.0  Dec 24.7398 12
13  1992.0  Jan 24.4359 1
14  1992.0  Feb 26.2892 2
15  1992.0  Mar 26.5409 3
16  1992.0  Apr 26.0819 4
17  1992.0  May 24.7852 5
18  1992.0  Jun 24.0563 6
19  1992.0  Jul 22.8377 7
20  1992.0  Aug 22.7902 8
Enter fullscreen mode Exit fullscreen mode

It is best practice to drop the unnecessary month name column.

temp_data_2 = temp_data_2.drop('Month', axis=1)
Enter fullscreen mode Exit fullscreen mode

Checking for null or missing values is very important in the ETL process.

temp_data_2.isnull().sum()
Enter fullscreen mode Exit fullscreen mode

Result:

Year            0
Temparature     0
month_number    0
dtype: int64
Enter fullscreen mode Exit fullscreen mode

There are no missing values in this data.

Now you find the mean of the temperature column and subtract the mean from each individual value in the column. This will help you find the temperature variation of every month against the year's mean temperature. This is a sort of normalization of data.

2. Visualizing the data.

Cartesian versus polar coordinate system
There are a few key phases to recreating Ed's GIF:

-learning how to plot on a polar coordinate system
-transforming the data for polar visualization
-customizing the aesthetics of the plot
-stepping through the visualization year-by-year and turning the plot into a GIF

- Preparing data for polar plotting

You need to subset the data by year and use the following coordinates:

r: temperature value for a given month, adjusted to contain no negative values.
Matplotlib supports plotting negative values, but not in the way you think. You want -0.1 to be closer to the center than 0.1, which isn't the default matplotlib behavior.
You also want to leave some space around the origin of the plot for displaying the year as text.
theta: generate 12 equally spaced angle values that span from 0 to 2*pi.

You'll start with how to plot just the data for the year 1991 in matplotlib, then scale up to all years.

To generate a matplotlib Axes object that uses the polar system, you need to set the projection parameter to "polar" when creating it.

fig = plt.figure(figsize=(8,8))
ax1 = plt.subplot(111, projection='polar')
Enter fullscreen mode Exit fullscreen mode

a matplotlib Axes object that uses the polar system,

To adjust the data to contain no negative temperature values, you need to first calculate the minimum temperature value:

temp_data_2['Temparature'].min()
Enter fullscreen mode Exit fullscreen mode

Result:

-2.3378881410256405
Enter fullscreen mode Exit fullscreen mode

You'll add

2 to all temperature values, so they'll be positive but there's still some space reserved around the origin for displaying text:

Note; adjust your value according to your data's minimum temperature.

You'll also generate 12 evenly spaced values from 0 to 2*pi and use the first 12 as the theta values:

# returns a boolean Series that selects only the rows 
#where the Year column is equal to 1991.
hc_1991 = temp_data_2[temp_data_2['Year'] == 1991]
#the code creates a new figure with 
#the plt.figure() function and sets the size of the figure to be 8 inches by 8 inches with figsize=(8,8).
fig = plt.figure(figsize=(8,8))
ax1 = plt.subplot(111, projection='polar')
r = hc_1991['Temparature'] + 2
theta = np.linspace(0, 2*np.pi, 12)
# Plot the data on the polar axes
ax1.plot(theta, r)

# hide all of the tick labels for both axes 
ax1.axes.get_yaxis().set_ticklabels([])
ax1.axes.get_xaxis().set_ticklabels([])
#Background color within the polar plot to be black, and the color surrounding the polar plot to be gray.
#I can use
#fig.set_facecolor() to set the foreground color and Axes.set_axis_bgcolor() to set the background color of the plot:
fig.set_facecolor("#323331")
ax1.set_facecolor('#000100')
#add the title and labels
ax1.set_ylabel('Temperature')
ax1.set_title("Kenya's Temperature Change (1991-2016)", color='white', fontdict={'fontsize': 30})
# Display the plot
plt.show()

Enter fullscreen mode Exit fullscreen mode

Plotting the remaining years
To plot the spirals for the remaining years, you need to repeat what you just did but for all of the years in the dataset. The one tweak you should make here is to manually set the axis limit for

r (or y in matplotlib). This is because matplotlib scales the size of the plot automatically based on the data that's used. This is why, in the last step, I observed that the data for just 1991 was displayed at the edge of the plotting area. You'll calculate the maximum temperature value in the entire dataset and add a generous amount of padding (to match what Ed did).

Now, you can use a for loop to generate the rest of the data. You'll leave out the code that generates the center text for now (otherwise each year will generate text at the same point and it'll be very messy):

You will use the color (or c) parameter when calling the Axes.plot() method and draw colors from plt.cm.(index).

ig = plt.figure(figsize=(14,14))
ax1 = plt.subplot(111, projection='polar')

# hide all of the tick labels for both axes 
ax1.axes.get_yaxis().set_ticklabels([])
ax1.axes.get_xaxis().set_ticklabels([])

#fig.set_facecolor() to set the foreground color and Axes.set_axis_bgcolor() to set the background color of the plot:
fig.set_facecolor("#323331")
#ax1.set_ylim(0, 3.25)


theta = np.linspace(0, 2*np.pi, 12)


ax1.set_title("Kenya's Temperature Change (1991-2016)", color='white', fontdict={'fontsize': 30})
ax1.set_facecolor('#000100')

years = temp_data_2['Year'].unique()

for index,Year in enumerate(years):
  r=temp_data_2.loc[temp_data_2["Year"]== Year,"Temparature"]+2
  ax1.plot(theta,r,c=plt.cm.viridis(index*2))
plt.show()

Enter fullscreen mode Exit fullscreen mode

Adding Temperature Rings
At this stage, the viewer can't actually understand the underlying data at all. There is no indication of temperture values in the visualization.
Next, You will add temperature rings at 0.0, 1.5, 2.0 degrees Celsius:
Then, finally Generating The GIF Animation
Now you're ready to generate a GIF animation from the plot. An animation is a series of images that are displayed in rapid succession. You'll use the

matplotlib.animation.FuncAnimation function to help with this. To take advantage of this function, you need to write code that:

defines the base plot appearance and properties
updates the plot between each frames with new data
you'll use the following required parameters when calling

FuncAnimation():

fig: the matplotlib Figure object
func: the update function that's called between each frame
frames: the number of frames (you want one for each year)
interval: the number of milliseconds each frame is displayed (there are 1000 milliseconds in a second)
This function will return a

matplotlib.animation.FuncAnimation object, which has a save() method you can use to write the animation to a GIF file.

The code block below shows all these above steps added to produce a GIF.

from mpl_toolkits.mplot3d import Axes3D 
months=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
fig=plt.figure(figsize=(15,15))
ax1=plt.subplot(111,projection="polar")

ax1.plot(full_circle_thetas, blue_one_radii, c='blue')
ax1.plot(full_circle_thetas, red_one_radii, c='red')
ax1.plot(full_circle_thetas, red_two_radii, c='red')
ax1.plot(full_circle_thetas, red_three_radii, c='red')
ax1.plot(full_circle_thetas, red_four_radii, c='red')

#fig.set_facecolor() to set the foreground color and Axes.set_axis_bgcolor() to set the background color of the plot:
fig.set_facecolor("#323331")
#ax1.set_ylim(0, 3.25)

ax1.text(np.pi/2, 1.0, "0.0 C", color="blue", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 2.0, "0.5 C", color="red", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 2.5, "1.0 C", color="red", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 3.0, "1.5 C", color="red", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 3.5, "2.0 C", color="red", ha='center', fontdict={'fontsize': 20})


ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_xticklabels([])
ax1.set_yticklabels([])


theta = np.linspace(0, 2*np.pi, 12)


ax1.set_title("Kenya's Temperature Change Spiral (1991-2016)", color='white', fontdict={'fontsize': 30})
ax1.set_facecolor('#000100')

years = temp_data_2['Year'].unique()

fig.text(0.78,0,"Kenya Temperature data",color="white",fontsize=20)
fig.text(0.05,0.02,"Everlynn Muthoni; Data Stories",color="white",fontsize=20)
fig.text(0.05,0,"Inspired by Ed Hawkins's 2017 Visualization",color="white",fontsize=15)

#add months ring
months_angles= np.linspace((np.pi/2)+(2*np.pi),np.pi/2,13)
for i,month in enumerate(months):
  ax1.text(months_angles[i],5.0,month,color="white",fontsize=15,ha="center")

#for index,Year in enumerate(years):
  #r=temp_data_2.loc[temp_data_2["Year"]== Year,"Temparature"]+2
  #ax1.plot(theta,r,c=plt.cm.viridis(index*15))

def update(i):
    # Remove the last year text at the center
    for txt in ax1.texts:
      if(txt.get_position()==(0,0)):
        txt.set_visible(False)
    # Specify how we want the plot to change in each frame.
    # We need to unravel the for loop we had earlier.
    Year = years[i]
    r = temp_data_2[temp_data_2['Year'] == Year]['Temparature'] + 2
    ax1.plot(theta, r, c=plt.cm.viridis(i*30))
    ax1.text(0,0,Year,fontsize=20,color="white",ha="center")
    return ax1

anim = animation.FuncAnimation(fig, update, frames=len(years), interval=10)


ffmpeg_writer = animation.FFMpegWriter();

anim.save("Spiral.gif", writer = 'pillow', fps = 5, dpi=100);
Enter fullscreen mode Exit fullscreen mode

Final result:

final gif visualization of Kenya's temperature data

3. The story our data visualization tells.

So....from the analysis and visualization, the following insights are deduced;

  • Since 1990 the temperature variation has been gradually increasing between February and June with the highest variation occurring mostly between June and July

-High-temperature variation mostly occurs during most of the first half of the year.

And that's it. Congrats, you have successfully visualized temperature data using a climate spiral!

Click here if you'd like to check out the source code.

4. Recommendations

  • For a better 3d visualization, explore the project using Matlab

  • For even better real time descriptive analysis, try to find data with the latest dates.

Like, subscribe and share your thoughts with me. Bye! and Happy coding.

Top comments (0)