DEV Community

Silvia-nyawira
Silvia-nyawira

Posted on

Week 2 Article: Exploratory Data Analysis using Data Visualization Techniques

Introduction

Exploratory data analysis involves understanding your data which helps in further Data preprocessing. It is simply exploring the data to identify trends and outliers using wonderful plots and charts.

Data Visualization involves representing text or numerical data in a visual format, which makes it easy to grasp the information. Python provides us with various libraries for data visualization like matplotlib, seaborn, plotly, etc.

Exploratory Data Analysis using Data Visualization Techniques
There are various tools and techniques used to understand your data,
There are two types of data analysis
Univariate Analysis
Univariate analysis is the simplest form of analysis where we explore a single variable. We perform Univariate analysis of Numerical and categorical variables differently because plotting uses different plots.
-Categorical variables; are variables that have text-based information. let’s look at various plots used to visualize Categorical data

  1. CountPlot Countplot is basically a count of frequency plot in form of a bar graph. It plots the count of each category in a separate bar. When we use the pandas’ value counts function on any column,
  2. pie chart Pie chart is also the same as the count plot, the difference is that it gives you additional information about the percentage presence of each category in data.
  • Numerical variables Analyzing Numerical data is important because understanding the distribution of variables helps to further process the data.
  • Histogram A histogram is a graph that shows the frequency of numerical data using rectangles. The height of a rectangle (the vertical axis) represents the distribution frequency of a variable or how often the variable appears. The width of a rectangle (Horizontal axis) represents the the value of the variable
  1. Distplot
    Distplot is also known as the second Histogram because it is a slight improvement version of the Histogram. Distplot gives us a KDE(Kernel Density Estimation) over histogram which explains PDF(Probability Density Function) which means what is the probability of each value occurring in this column.

  2. Boxplot
    Boxplot displays the five-number summary of a set of data. The five- number summary is; the minimum, first quartile, median, third quartile and maximum
    Bivariate Analysis
    Bivariate Analysis is used when we have to explore the relationship between 2 different variables and when we analyze more than 2 variables together then it is known as Multivariate Analysis.

  • Numerical and Numerical 1) Scatter Plot A scatter plot uses dots to represent values for two different numeric bivariate variables. The position of each dot on the horizontal and vertical axis indicates value for an individual data point.

• Multivariate analysis with scatter plot
we can also plot 3 variable or 4 variable relationships with scatter plot.
We can also see 4 variable multivariate analyses with scatter plots using style argument.

  • Numerical and Categorical If one variable is numerical and one is categorical then there are various plots that we can use for Bivariate and Multivariate analysis. • Bar Plot Bar plot is a simple plot which we can use to plot categorical variable on the x-axis and numerical variable on y-axis and explore the relationship between both variables. The blacktip on top of each bar shows the confidence Interval.

• Distplot
Distplot explains the PDF function using kernel density estimation. Distplot does not have a hue parameter but we can create it.

  • Categorical and Categorical

• Heatmap
Heatmap is a similar visual representation of crosstab function of pandas. It basically shows how much presence of one category concerning another category is present in the dataset.

• Cluster map
we can also use a cluster map to understand the relationship between two categorical variables. A cluster map basically plots a dendrogram that shows the categories of similar behavior together.

** Conclusion**
Explanatory Data analysis is a key to have better understanding and representing your data which helps you build a stronger and more generalized model. Data visualization is a powerful tool for revealing the stories buried in data. It goes beyond creating attractive charts and graphs. By utilizing the art and science of data visualization, we may improve communication, uncover new information, and make new informed judgement in addition to improving our capacity for interpreting data.

Top comments (0)