oladejo abdullahi

Posted on Dec 28, 2024

How to do Review Sentiment Analysis using Python

#datascience #beginners #analyst #database

Introduction

Have you ever wonder how big company understand your opinion and emotion on a certain thing have you once ask yourself how do this company read a the review of their million customers. Imagine you are working for a company and you were given a data set comprises of customers reviews. How do you conclude on what their review and sentiments is about if your concern is how to analyse the sentiment of any products or app review then this article is for you. In this article I will take you through the task of T-Shirt Reviews Sentiment Analysis using Python.

What is Sentiment Analysis

Sentiment analysis is about evaluating text for positive or negative views and feelings which can be helpful and rewarding in certain circumstances, such as: reviews, comment or review systems, or when in an industry that expects a certain attitude.

T-Shirt Review Sentiment Analysis

App Reviews Sentiment Analysis means evaluating and understanding the sentiments expressed in customers reviews on a T-Shirt ordered. It involves using data analysis techniques to determine whether the sentiments in these reviews are positive, negative, or neutral.

Steps to follow

Gather a dataset of App or your product reviews.(Here we have Shirts reviews)
Perform Exploratory Data Analysis (EDA)
label the sentiment data using NLTK tools e.g l Textblob,Stanza,VADER, Pattern or Flair.
Understand the overall distribution of sentiments (positive, negative, neutral) in the dataset.
Explore the relationship between the sentiments and the ratings given.
Analyze the text of the reviews in different sentiment categories.

App Reviews Sentiment Analysis using Python

Now, we are going to follow the step one by one. Here I have a dataset of T-shirt you can download it here.
we will begins by importing the necessary Python libraries and the dataset:

Importing important libraries import pandas as pd

import seaborn as sns
import matplotlib as mt
import matplotlib.pyplot as plt

Reading the dataset

dataSet=pd.read_csv("TeePublic_review.csv", encoding="latin-1")

Get the first ten rows

print(dataSet.head())
dataSet.info()

Result>>

RangeIndex: 278100 entries, 0 to 278099
Data columns (total 10 columns):

#	Column	Non-Null Count	Dtype
0	reviewer_id	278099 non-null	float64
1	store_location	278100 non-null	object
2	latitude	278100 non-null	float64
3	longitude	278100 non-null	float64
4	date	278100 non-null	int64
5	month	278100 non-null	int64
6	year	278100 non-null	object
7	title	278088 non-null	object
8	review	247597 non-null	object
9	review-label	278100 non-null	int64

dtypes: float64(3), int64(3), object(4)

Comment: As you can see above the data set comprises of 10 variables, where the title and Review looks similar and the review-label is specifying Rating. Now let us clean the dataset.

check for empty and null cell

dataSet.isnull().sum()

Result

reviewer_id           1
store_location        0
latitude              0
longitude             0
date                  0
month                 0
year                  0
title                12
review            30503
review-label          0
dtype: int64

Remove empty or null cell

dataSet=dataSet.dropna()

Comment: Ofcourse there are some empty cells under title column and review that is why we need to remove those from the data by using drop() method.

Get the description statistics of the data

print(dataSet.describe())

Result

reviewer_id       latitude      longitude           date  \
count  247587.000000  247587.000000  247587.000000  247587.000000   
mean   138902.686849      37.210091     -88.254362    2020.890281   
std     80076.904234      10.204186      36.903583       1.386106   
min         0.000000     -40.900557    -172.104629    2018.000000   
25%     69330.500000      37.090240     -95.712891    2020.000000   
50%    139217.000000      37.090240     -95.712891    2021.000000   
75%    207521.500000      37.090240     -95.712891    2022.000000   
max    278098.000000      64.963051     174.885971    2023.000000   

               month   review-label  
count  247587.000000  247587.000000  
mean        7.221966       4.379612  
std         3.682415       1.197636  
min         1.000000       1.000000  
25%         4.000000       4.000000  
50%         7.000000       5.000000  
75%        11.000000       5.000000  
max        12.000000       5.000000

2. Performing Exploratory Data Analysis (EDA)¶

Plotting graph showing the distribution of Rating

# Plotting the distribution of ratings
sns.set(style="whitegrid")
plt.figure(figsize=(9, 5))
sns.countplot(data=dataSet, x='review-label',color='#7a4499')
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()

Comment
In General It can be stated that the T-shirt Rating is impressive as almost all the Rating is five. So, let us have a look at the distribution of Rating by year.

# Plotting the distribution of ratings
sns.set(style="whitegrid")
plt.figure(figsize=(9, 5))
sns.countplot(data=dataSet, x='date',hue='review-label')
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()

Comments:

looking at the visual we see that the rating distribution each year looks the same. Now let us check for the length of Review. Perhaps we might see something interesting about it, we will do that by just adding Review length column to our dataSet then visualize it.

Distribution of length of Reviews Lengths

# Calculating the length of each review
dataSet['Review Length'] = dataSet['review'].apply(str).apply(len)
# Plotting the distribution of review lengths
plt.figure(figsize=(10, 8))
plt.subplot(1,2,1)
sns.histplot(dataSet['Review Length'],kde=True)
plt.xlabel('Length of Review')
plt.ylabel('Count')

plt.subplot(1,2,2)
sns.boxplot(dataSet['Review Length'])
plt.subplots_adjust(wspace=0.7)
plt.ylabel('Review Length')
plt.suptitle('Distribution of Review Lengths') 
plt.show()

Understand the overall distribution of sentiments (positive, negative, neutral) in the dataset.¶ Creating function that evaluate the Sentiment

from textblob import TextBlob
def sentiment_Evaluation(review):
    # Analyzing the sentiment of the review
    sentiment = TextBlob(review).sentiment
    # Classifying based on polarity
    if sentiment.polarity > 0.1:
        return 'Positive'
    elif sentiment.polarity < -0.1:
        return 'Negative'
    else:
        return 'Neutral'

Add sentiments Column to the dataSet

dataSet['Sentiments']=dataSet['review'].apply(sentiment_Evaluation)

Creating Chart showing the distribution of the Sentiment

plt.figure(figsize=(10, 5))
sns.countplot(data=dataSet, x='Sentiments')
plt.title('Sentiment Distribution Across Ratings')
plt.xlabel('Sentiments')
plt.ylabel('Count')
plt.show()

Distribution of Sentiment By Year

plt.figure(figsize=(10, 5))
sns.countplot(data=dataSet, x='date', hue='Sentiments')
plt.title('Sentiment Distribution Across Ratings')
plt.xlabel('Sentiments')
plt.ylabel('Count')
plt.show()

Comment: As we can see above most of the reviews have positive sentiment. Now let us see distribution across the rating may be hre is relationship.

Exploring the relationship between the sentiments and the ratings¶

plt.figure(figsize=(10, 5))
sns.countplot(data=dataSet, x='review-label', hue='Sentiments')
plt.title('Sentiment Distribution Across Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.legend(title='Sentiment')
plt.show()

Comment; there seems to be relationship between the rating and the sentiment. Yes the proportion of positive sentiment increase as the rating increase while the proportion decrease as the rating increase. Now let us analyse the text review using word cloud

Analyze the text of the reviews in different sentiment categories.

from wordcloud import WordCloud
# Function to generate word cloud for each sentiment
def generate_word_cloud(sentiment):
 SelectedOne=dataSet[dataSet['Sentiments']==sentiment]
 text = ' '.join(str(review) for review in SelectedOne['review'])
 wordcloud = WordCloud(width=800, height=400).generate(text)
 plt.figure(figsize=(10, 5))
 plt.imshow(wordcloud, interpolation='bilinear')
 plt.title('Word Cloud for '+ sentiment+ ' Reviews')
 plt.axis('off')
 plt.show()

calling the function for each of the sentiment

generate_word_cloud('Negative')
generate_word_cloud('Positive')
generate_word_cloud('Neutral')

Comment: As you can see the word cloud above already summarized the text in the review for us. I hope you find this work helpful, drop your comment below for me.

if you have any question don't hesitate to ask. chat me up on WhatsApp or Mail. Don't forget to follow me on Twitter so that you don't miss any of my articles.

DEV Community

How to do Review Sentiment Analysis using Python

Introduction

What is Sentiment Analysis

T-Shirt Review Sentiment Analysis

Steps to follow

App Reviews Sentiment Analysis using Python

Importing important libraries import pandas as pd

Reading the dataset

Get the first ten rows

2. Performing Exploratory Data Analysis (EDA)¶

Top comments (0)

Read next

📊 AI Dashboard Builder: Create Insightful Dashboards just Droppping your Data

Interview Questions on AWS Networking: VPC, Subnets, and Security Groups

New Method Lets You Train 100B AI Models on a Single Consumer GPU, 2.6x Faster

Google's LearnLM: AI Model Gets Teaching Upgrade to Boost Educational Performance