DEV Community

oladejo abdullahi
oladejo abdullahi

Posted on

How to do Review Sentiment Analysis using Python

Introduction

Have you ever wonder how big company understand your opinion and emotion on a certain thing have you once ask yourself how do this company read a the review of their million customers. Imagine you are working for a company and you were given a data set comprises of customers reviews. How do you conclude on what their review and sentiments is about if your concern is how to analyse the sentiment of any products or app review then this article is for you. In this article I will take you through the task of T-Shirt Reviews Sentiment Analysis using Python.

What is Sentiment Analysis

Sentiment analysis is about evaluating text for positive or negative views and feelings which can be helpful and rewarding in certain circumstances, such as: reviews, comment or review systems, or when in an industry that expects a certain attitude.

T-Shirt Review Sentiment Analysis

App Reviews Sentiment Analysis means evaluating and understanding the sentiments expressed in customers reviews on a T-Shirt ordered. It involves using data analysis techniques to determine whether the sentiments in these reviews are positive, negative, or neutral.

Steps to follow

  1. Gather a dataset of App or your product reviews.(Here we have Shirts reviews)
  2. Perform Exploratory Data Analysis (EDA)
  3. label the sentiment data using NLTK tools e.g l Textblob,Stanza,VADER, Pattern or Flair.
  4. Understand the overall distribution of sentiments (positive, negative, neutral) in the dataset.
  5. Explore the relationship between the sentiments and the ratings given.
  6. Analyze the text of the reviews in different sentiment categories.

App Reviews Sentiment Analysis using Python

Now, we are going to follow the step one by one. Here I have a dataset of T-shirt you can download it here.
we will begins by importing the necessary Python libraries and the dataset:

Importing important libraries import pandas as pd

import seaborn as sns
import matplotlib as mt
import matplotlib.pyplot as plt
Enter fullscreen mode Exit fullscreen mode

Reading the dataset

dataSet=pd.read_csv("TeePublic_review.csv", encoding="latin-1")
Enter fullscreen mode Exit fullscreen mode

Get the first ten rows

print(dataSet.head())
dataSet.info()
Enter fullscreen mode Exit fullscreen mode

Result>>

RangeIndex: 278100 entries, 0 to 278099
Data columns (total 10 columns):

# Column Non-Null Count Dtype
0 reviewer_id 278099 non-null float64
1 store_location 278100 non-null object
2 latitude 278100 non-null float64
3 longitude 278100 non-null float64
4 date 278100 non-null int64
5 month 278100 non-null int64
6 year 278100 non-null object
7 title 278088 non-null object
8 review 247597 non-null object
9 review-label 278100 non-null int64

dtypes: float64(3), int64(3), object(4)

Comment: As you can see above the data set comprises of 10 variables, where the title and Review looks similar and the review-label is specifying Rating. Now let us clean the dataset.

check for empty and null cell

dataSet.isnull().sum()
Enter fullscreen mode Exit fullscreen mode

Result

reviewer_id           1
store_location        0
latitude              0
longitude             0
date                  0
month                 0
year                  0
title                12
review            30503
review-label          0
dtype: int64
Enter fullscreen mode Exit fullscreen mode

Remove empty or null cell

dataSet=dataSet.dropna()
Enter fullscreen mode Exit fullscreen mode

Comment: Ofcourse there are some empty cells under title column and review that is why we need to remove those from the data by using drop() method.

Get the description statistics of the data

print(dataSet.describe())
Enter fullscreen mode Exit fullscreen mode

Result

reviewer_id       latitude      longitude           date  \
count  247587.000000  247587.000000  247587.000000  247587.000000   
mean   138902.686849      37.210091     -88.254362    2020.890281   
std     80076.904234      10.204186      36.903583       1.386106   
min         0.000000     -40.900557    -172.104629    2018.000000   
25%     69330.500000      37.090240     -95.712891    2020.000000   
50%    139217.000000      37.090240     -95.712891    2021.000000   
75%    207521.500000      37.090240     -95.712891    2022.000000   
max    278098.000000      64.963051     174.885971    2023.000000   

               month   review-label  
count  247587.000000  247587.000000  
mean        7.221966       4.379612  
std         3.682415       1.197636  
min         1.000000       1.000000  
25%         4.000000       4.000000  
50%         7.000000       5.000000  
75%        11.000000       5.000000  
max        12.000000       5.000000  
Enter fullscreen mode Exit fullscreen mode

2. Performing Exploratory Data Analysis (EDA)¶

Plotting graph showing the distribution of Rating

# Plotting the distribution of ratings
sns.set(style="whitegrid")
plt.figure(figsize=(9, 5))
sns.countplot(data=dataSet, x='review-label',color='#7a4499')
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image description
Comment
In General It can be stated that the T-shirt Rating is impressive as almost all the Rating is five. So, let us have a look at the distribution of Rating by year.

# Plotting the distribution of ratings
sns.set(style="whitegrid")
plt.figure(figsize=(9, 5))
sns.countplot(data=dataSet, x='date',hue='review-label')
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image description
Comments:

looking at the visual we see that the rating distribution each year looks the same. Now let us check for the length of Review. Perhaps we might see something interesting about it, we will do that by just adding Review length column to our dataSet then visualize it.

Distribution of length of Reviews Lengths

# Calculating the length of each review
dataSet['Review Length'] = dataSet['review'].apply(str).apply(len)
# Plotting the distribution of review lengths
plt.figure(figsize=(10, 8))
plt.subplot(1,2,1)
sns.histplot(dataSet['Review Length'],kde=True)
plt.xlabel('Length of Review')
plt.ylabel('Count')

plt.subplot(1,2,2)
sns.boxplot(dataSet['Review Length'])
plt.subplots_adjust(wspace=0.7)
plt.ylabel('Review Length')
plt.suptitle('Distribution of Review Lengths') 
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image description

  1. Understand the overall distribution of sentiments (positive, negative, neutral) in the dataset.¶ Creating function that evaluate the Sentiment
from textblob import TextBlob
def sentiment_Evaluation(review):
    # Analyzing the sentiment of the review
    sentiment = TextBlob(review).sentiment
    # Classifying based on polarity
    if sentiment.polarity > 0.1:
        return 'Positive'
    elif sentiment.polarity < -0.1:
        return 'Negative'
    else:
        return 'Neutral'
Enter fullscreen mode Exit fullscreen mode

Add sentiments Column to the dataSet

dataSet['Sentiments']=dataSet['review'].apply(sentiment_Evaluation)
Enter fullscreen mode Exit fullscreen mode

Creating Chart showing the distribution of the Sentiment

plt.figure(figsize=(10, 5))
sns.countplot(data=dataSet, x='Sentiments')
plt.title('Sentiment Distribution Across Ratings')
plt.xlabel('Sentiments')
plt.ylabel('Count')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image description

Distribution of Sentiment By Year

plt.figure(figsize=(10, 5))
sns.countplot(data=dataSet, x='date', hue='Sentiments')
plt.title('Sentiment Distribution Across Ratings')
plt.xlabel('Sentiments')
plt.ylabel('Count')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image description
Comment: As we can see above most of the reviews have positive sentiment. Now let us see distribution across the rating may be hre is relationship.

  1. Exploring the relationship between the sentiments and the ratings¶
plt.figure(figsize=(10, 5))
sns.countplot(data=dataSet, x='review-label', hue='Sentiments')
plt.title('Sentiment Distribution Across Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.legend(title='Sentiment')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image description
Comment; there seems to be relationship between the rating and the sentiment. Yes the proportion of positive sentiment increase as the rating increase while the proportion decrease as the rating increase. Now let us analyse the text review using word cloud

  1. Analyze the text of the reviews in different sentiment categories.
from wordcloud import WordCloud
# Function to generate word cloud for each sentiment
def generate_word_cloud(sentiment):
 SelectedOne=dataSet[dataSet['Sentiments']==sentiment]
 text = ' '.join(str(review) for review in SelectedOne['review'])
 wordcloud = WordCloud(width=800, height=400).generate(text)
 plt.figure(figsize=(10, 5))
 plt.imshow(wordcloud, interpolation='bilinear')
 plt.title('Word Cloud for '+ sentiment+ ' Reviews')
 plt.axis('off')
 plt.show()
Enter fullscreen mode Exit fullscreen mode

Image description
calling the function for each of the sentiment

generate_word_cloud('Negative')
generate_word_cloud('Positive')
generate_word_cloud('Neutral')

Enter fullscreen mode Exit fullscreen mode

Image description
Image description

Image description
Comment: As you can see the word cloud above already summarized the text in the review for us. I hope you find this work helpful, drop your comment below for me.

if you have any question don't hesitate to ask. chat me up on WhatsApp or Mail. Don't forget to follow me on Twitter so that you don't miss any of my articles.

Top comments (0)