Introduction
Have you ever wonder how big company understand your opinion and emotion on a certain thing have you once ask yourself how do this company read a the review of their million customers. Imagine you are working for a company and you were given a data set comprises of customers reviews. How do you conclude on what their review and sentiments is about if your concern is how to analyse the sentiment of any products or app review then this article is for you. In this article I will take you through the task of T-Shirt Reviews Sentiment Analysis using Python.
What is Sentiment Analysis
Sentiment analysis is about evaluating text for positive or negative views and feelings which can be helpful and rewarding in certain circumstances, such as: reviews, comment or review systems, or when in an industry that expects a certain attitude.
T-Shirt Review Sentiment Analysis
App Reviews Sentiment Analysis means evaluating and understanding the sentiments expressed in customers reviews on a T-Shirt ordered. It involves using data analysis techniques to determine whether the sentiments in these reviews are positive, negative, or neutral.
Steps to follow
- Gather a dataset of App or your product reviews.(Here we have Shirts reviews)
- Perform Exploratory Data Analysis (EDA)
- label the sentiment data using NLTK tools e.g l Textblob,Stanza,VADER, Pattern or Flair.
- Understand the overall distribution of sentiments (positive, negative, neutral) in the dataset.
- Explore the relationship between the sentiments and the ratings given.
- Analyze the text of the reviews in different sentiment categories.
App Reviews Sentiment Analysis using Python
Now, we are going to follow the step one by one. Here I have a dataset of T-shirt you can download it here.
we will begins by importing the necessary Python libraries and the dataset:
Importing important libraries import pandas as pd
import seaborn as sns
import matplotlib as mt
import matplotlib.pyplot as plt
Reading the dataset
dataSet=pd.read_csv("TeePublic_review.csv", encoding="latin-1")
Get the first ten rows
print(dataSet.head())
dataSet.info()
Result>>
RangeIndex: 278100 entries, 0 to 278099
Data columns (total 10 columns):
# | Column | Non-Null Count | Dtype |
---|---|---|---|
0 | reviewer_id | 278099 non-null | float64 |
1 | store_location | 278100 non-null | object |
2 | latitude | 278100 non-null | float64 |
3 | longitude | 278100 non-null | float64 |
4 | date | 278100 non-null | int64 |
5 | month | 278100 non-null | int64 |
6 | year | 278100 non-null | object |
7 | title | 278088 non-null | object |
8 | review | 247597 non-null | object |
9 | review-label | 278100 non-null | int64 |
dtypes: float64(3), int64(3), object(4)
Comment: As you can see above the data set comprises of 10 variables, where the title and Review looks similar and the review-label is specifying Rating. Now let us clean the dataset.
check for empty and null cell
dataSet.isnull().sum()
Result
reviewer_id 1
store_location 0
latitude 0
longitude 0
date 0
month 0
year 0
title 12
review 30503
review-label 0
dtype: int64
Remove empty or null cell
dataSet=dataSet.dropna()
Comment: Ofcourse there are some empty cells under title column and review that is why we need to remove those from the data by using drop() method.
Get the description statistics of the data
print(dataSet.describe())
Result
reviewer_id latitude longitude date \
count 247587.000000 247587.000000 247587.000000 247587.000000
mean 138902.686849 37.210091 -88.254362 2020.890281
std 80076.904234 10.204186 36.903583 1.386106
min 0.000000 -40.900557 -172.104629 2018.000000
25% 69330.500000 37.090240 -95.712891 2020.000000
50% 139217.000000 37.090240 -95.712891 2021.000000
75% 207521.500000 37.090240 -95.712891 2022.000000
max 278098.000000 64.963051 174.885971 2023.000000
month review-label
count 247587.000000 247587.000000
mean 7.221966 4.379612
std 3.682415 1.197636
min 1.000000 1.000000
25% 4.000000 4.000000
50% 7.000000 5.000000
75% 11.000000 5.000000
max 12.000000 5.000000
2. Performing Exploratory Data Analysis (EDA)¶
Plotting graph showing the distribution of Rating
# Plotting the distribution of ratings
sns.set(style="whitegrid")
plt.figure(figsize=(9, 5))
sns.countplot(data=dataSet, x='review-label',color='#7a4499')
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()
Comment
In General It can be stated that the T-shirt Rating is impressive as almost all the Rating is five. So, let us have a look at the distribution of Rating by year.
# Plotting the distribution of ratings
sns.set(style="whitegrid")
plt.figure(figsize=(9, 5))
sns.countplot(data=dataSet, x='date',hue='review-label')
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()
looking at the visual we see that the rating distribution each year looks the same. Now let us check for the length of Review. Perhaps we might see something interesting about it, we will do that by just adding Review length column to our dataSet then visualize it.
Distribution of length of Reviews Lengths
# Calculating the length of each review
dataSet['Review Length'] = dataSet['review'].apply(str).apply(len)
# Plotting the distribution of review lengths
plt.figure(figsize=(10, 8))
plt.subplot(1,2,1)
sns.histplot(dataSet['Review Length'],kde=True)
plt.xlabel('Length of Review')
plt.ylabel('Count')
plt.subplot(1,2,2)
sns.boxplot(dataSet['Review Length'])
plt.subplots_adjust(wspace=0.7)
plt.ylabel('Review Length')
plt.suptitle('Distribution of Review Lengths')
plt.show()
- Understand the overall distribution of sentiments (positive, negative, neutral) in the dataset.¶ Creating function that evaluate the Sentiment
from textblob import TextBlob
def sentiment_Evaluation(review):
# Analyzing the sentiment of the review
sentiment = TextBlob(review).sentiment
# Classifying based on polarity
if sentiment.polarity > 0.1:
return 'Positive'
elif sentiment.polarity < -0.1:
return 'Negative'
else:
return 'Neutral'
Add sentiments Column to the dataSet
dataSet['Sentiments']=dataSet['review'].apply(sentiment_Evaluation)
Creating Chart showing the distribution of the Sentiment
plt.figure(figsize=(10, 5))
sns.countplot(data=dataSet, x='Sentiments')
plt.title('Sentiment Distribution Across Ratings')
plt.xlabel('Sentiments')
plt.ylabel('Count')
plt.show()
Distribution of Sentiment By Year
plt.figure(figsize=(10, 5))
sns.countplot(data=dataSet, x='date', hue='Sentiments')
plt.title('Sentiment Distribution Across Ratings')
plt.xlabel('Sentiments')
plt.ylabel('Count')
plt.show()
Comment: As we can see above most of the reviews have positive sentiment. Now let us see distribution across the rating may be hre is relationship.
- Exploring the relationship between the sentiments and the ratings¶
plt.figure(figsize=(10, 5))
sns.countplot(data=dataSet, x='review-label', hue='Sentiments')
plt.title('Sentiment Distribution Across Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.legend(title='Sentiment')
plt.show()
Comment; there seems to be relationship between the rating and the sentiment. Yes the proportion of positive sentiment increase as the rating increase while the proportion decrease as the rating increase. Now let us analyse the text review using word cloud
- Analyze the text of the reviews in different sentiment categories.
from wordcloud import WordCloud
# Function to generate word cloud for each sentiment
def generate_word_cloud(sentiment):
SelectedOne=dataSet[dataSet['Sentiments']==sentiment]
text = ' '.join(str(review) for review in SelectedOne['review'])
wordcloud = WordCloud(width=800, height=400).generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title('Word Cloud for '+ sentiment+ ' Reviews')
plt.axis('off')
plt.show()
calling the function for each of the sentiment
generate_word_cloud('Negative')
generate_word_cloud('Positive')
generate_word_cloud('Neutral')
Comment: As you can see the word cloud above already summarized the text in the review for us. I hope you find this work helpful, drop your comment below for me.
if you have any question don't hesitate to ask. chat me up on WhatsApp or Mail. Don't forget to follow me on Twitter so that you don't miss any of my articles.
Top comments (0)