As a data scientist, I find myself not only finding the data required, but also needing to think about how to present the data in a helpful, meaningful fashion I.E. Data Visualization. Though these two fields are distinct, they are often dependent on each other, for if I cannot present or visualize the data I have obtained in a helpful way, it can mean that the data is incomplete or even incorrect.
One database company has a whitepaper written to advance their product, and this is an excerpt from the paper about time series data:
A line graph is the simplest way to represent time series data. It helps the viewer get a quick sense of how something has changed over time:
This statement is hard to deny, but the placeholder graph can be applied to a very broad range of data, not especially for time series data. That being the case, I feel there is actually a lack in the sense of how things changed since the graph is so generic.
Presenting data is a skill very distinct from isolating and compiling it from the raw data sources. Just as I make use of dedicated Python libraries for building data sets, I feel one should go to dedicated presentation resources when thinking about how to present data.
Here is a graph, modeled after one of the famous charts presented by the Statistician Hans Rosling in his TED talks:
The format Rosling chose here manages to let you "get a sense" of a very large dataset very quickly and in an unprecedentedly dramatic fashion.
One popular product for data visualization is Seaborn, a python library. Dedicated products enable me to quickly try out different views and formats for my data and this enables quick feedback to see if I am on the right track. One advantage of Seaborn is that its code is very succinct, allowing you to create plots and grids in just a few lines of code, sometimes just one will suffice.
Here are a few plots made with Seaborn:
# import seaborn library into memory
import seaborn as sns
# load in a built-in dataset from seaborn to use for plots
df = sns.load_dataset('iris')
# check dataset to see column names
df.info()
# plot a histogram of sepal length according to species
sns.histplot(data=df, x="sepal_length", hue="species", element="step");
# plot a pairplot for all features
sns.pairplot(df);
# plot a scatterplot for petal information
sns.scatterplot(x = 'petal_width', y= 'petal_length', hue = 'species', data = df);
# plot a barplot of sepal information on x and y
sns.barplot(data= df, x = 'sepal_width', y='sepal_length')
These are but a few examples of the awesome things Seaborn can help you accomplish. To learn about Seaborn, check out the docs.
Top comments (0)