DEV Community

Cover image for SALES DATA ANALYSIS
DoreenNangira
DoreenNangira

Posted on

SALES DATA ANALYSIS

INTRODUCTION

Dear readers, in this article, I am going to share my findings of a data analysis project that I recently undertook during my HNG internship program. HNG is a fast paced program that helps developers and people in the tech field to practice their skills according to their domain. In this task, we were given sample datasets from Kaggle and we were asked to perform analysis on the project. The main aim of this project was to come up with a detailed data analysis report. Without wasting more time, let's dive into the details.

PREREQUISITES AND PREPARATION

Install necessary tools
You can use either Excel, SQL, Python or your preferred tool. In my case, I used Python. I installed the Python libraries for data analysis such as Pandas for data manipulation and Matplotlib for visualization.
Extract your data
I extracted my data from kaggle using Pandas. The following is the link
https://www.kaggle.com/datasets/kyanyoga/sample-sales-data

Perform your data cleaning and analysis
After extracting your data, study it then start working on it

KEY VARIABLES AND DATATYPES

  • Numeric(Integers and Floats):
    I. Sales
    II. Price Each
    III. Quantity Ordered
    IV. Order Number

  • Categorical:
    I. Country
    II. City
    III. Customer Name
    IV. Product Line
    V. Deal Size
    VI. Status

  • Date:
    I. Order Date

INITIAL INSIGHTS

  1. The dataset has 2823 rows and 25 columns This is achieved by using .shape method on your Dataframe eg if your Dataframe is named df, below is how you can implement this. print(df.shape)
  2. There is a total of 7 products in the dataset that is unique names in the PRODUCTLINE column
  3. The dataset has null values and these are found in 3 columns namely State, Address line 2 and Postal code

SALES PERFORMANCE BY PRODUCT
Among the 7 products in the dataset, classic cars have the highest total sales of 3.27 million while Trains have the lowest of 201 thousand. Also, classic cars have the highest orders of 28,547 while trains have the least number of orders of 2,395.
Below is a Pie chart showing sales of each product in percentage:
Image description
SALES PERFORMANCE BY TERRITORY
EMEA has the highest number of sales while Japan has the lowest number of sales
SALES PERFORMANCE BY MONTH
November has the highest number of sales of 2.1 million while June has the lowest number of sales of 454,756.78 thousand.
Below is a figure that shows this data:
Image description
PERFORMANCE BY STATUS
2617 products were successfully shipped
**60 **products that were initially ordered got cancelled

CONCLUSION

Initial observation of the sales dataset reveals sales distribution over key metrics such as year, month, territory and even products. More insights to be found include:
a. Customers with highest number of sales
b. The countries with least number of sales and the ones with highest number of sales
c. Customers that have the highest number of orders cancelled

To view a detailed analysis of this project, check my repository on github below:
https://github.com/Doreen970/HNG_data

To learn about HNG internship, follow the following links:
https://hng.tech/internship
https://hng.tech/hire

Top comments (0)