In the ever-evolving world of Natural Language Processing (NLP), sentiment analysis remains a crucial task. Today, we'll dive into a powerful approach to sentiment analysis using BERT (Bidirectional Encoder Representations from Transformers) on the IMDB movie reviews dataset. This blog will guide you through the process of building a sentiment analysis model that can classify movie reviews as positive or negative.
The Dataset
We'll be using the IMDB dataset, which contains 50,000 movie reviews split evenly between positive and negative sentiments. This dataset is widely used in the NLP community and provides a great starting point for sentiment analysis tasks.
Setting Up the Environment
Before we begin, make sure you have the necessary libraries installed:
pip install pandas datasets scikit-learn transformers torch tensorflow
pip install --upgrade tensorflow transformers
Loading and Preprocessing the Data
First, let's load the IMDB dataset using the Hugging Face datasets
library:
from datasets import load_dataset
import pandas as pd
# Load IMDB dataset
dataset = load_dataset('imdb')
# Convert to pandas DataFrame
train_dataframe = pd.DataFrame(dataset['train'])
test_dataframe = pd.DataFrame(dataset['test'])
# Display basic info
print(train_dataframe.info())
print(train_dataframe['label'].value_counts(normalize=True))
plt.figure(figsize=(8, 6))
sns.countplot(x='label', data=train_df)
plt.title('Distribution of Sentiment Labels')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.show()
Preprocessing with BERT Tokenizer
Next, we'll preprocess the text data using BERT's tokenizer:
from transformers import BertTokenizer
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
def preprocess_data(texts, labels, max_length=256):
encoded = tokenizer.batch_encode_plus(
texts,
add_special_tokens=True,
max_length=max_length,
padding='max_length',
truncation=True,
return_attention_mask=True,
return_tensors='pt'
)
return {
'input_ids': encoded['input_ids'],
'attention_mask': encoded['attention_mask'],
'labels': torch.tensor(labels)
}
# Preprocess training and testing data
train_data = preprocess_data(train_dataframe['text'].tolist(), train_dataframe['label'].tolist())
test_data = preprocess_data(test_dataframe['text'].tolist(), test_dataframe['label'].tolist())
Setting Up the Model
We'll use the BertForSequenceClassification
model from Hugging Face:
from transformers import BertForSequenceClassification, AdamW
from torch.utils.data import DataLoader, TensorDataset
# Initialize model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
# Optimizer
optimizer = AdamW(model.parameters(), lr=2e-5)
# Create DataLoader
train_dataset = TensorDataset(train_data['input_ids'], train_data['attention_mask'], train_data['labels'])
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
Training the Model
Now, let's train our model:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
num_epochs = 3
for epoch in range(num_epochs):
model.train()
for batch in train_loader:
input_ids, attention_mask, labels = [b.to(device) for b in batch]
optimizer.zero_grad()
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs.loss
loss.backward()
optimizer.step()
print(f"Data passes through {epoch+1}/{num_epochs} times or epochs. ")
# Save the model
torch.save(model.state_dict(), 'bert_sentiment_v1_model.pth')
Training Environment Details
I trained the model using Google Colab with the runtime setting configured to utilize a GPU. Despite leveraging GPU acceleration, the training process took approximately 49 minutes to complete.
Evaluating the Model
After training, let's evaluate our model's performance:
from sklearn.metrics import accuracy_score, classification_report
model.eval()
test_dataset = TensorDataset(test_data['input_ids'], test_data['attention_mask'], test_data['labels'])
test_loader = DataLoader(test_dataset, batch_size=32)
all_preds = []
all_labels = []
with torch.no_grad():
for batch in test_loader:
input_ids, attention_mask, labels = [b.to(device) for b in batch]
outputs = model(input_ids, attention_mask=attention_mask)
preds = torch.argmax(outputs.logits, dim=1)
all_preds.extend(preds.cpu().numpy())
all_labels.extend(labels.cpu().numpy())
accuracy = accuracy_score(all_labels, all_preds)
print(f"Accuracy: {accuracy}")
print(classification_report(all_labels, all_preds))
Interpreting the Model's Performance
Let's break down the results of our sentiment analysis model:
Overall Accuracy
The model achieved an impressive overall accuracy of 92.128%. This means that out of all the movie reviews in the test set, our model correctly classified 92.128% of them as either positive or negative.
Class-specific Metrics
Negative Reviews (Class 0)
Precision: 0.93
Recall: 0.91
F1-score: 0.92
Positive Reviews (Class 1)
Precision: 0.91
Recall: 0.93
F1-score: 0.92
Interpretation
Balanced Performance: The model shows consistent performance across both positive and negative reviews, with identical F1-scores of 0.92 for both classes. This indicates that the model is well-balanced and doesn't favor one sentiment over the other.
Precision:
For negative reviews, the precision of 0.93 means that when the model predicts a review is negative, it's correct 93% of the time.
For positive reviews, the precision of 0.91 indicates that when the model predicts a review is positive, it's correct 91% of the time.
Recall:
For negative reviews, the recall of 0.91 means the model correctly identifies 91% of all actual negative reviews.
For positive reviews, the recall of 0.93 shows the model correctly identifies 93% of all actual positive reviews.
F1-Score:
The F1-score of 0.92 for both classes represents a strong balance between precision and recall, indicating robust overall performance.
Support:
The test set contains an equal number of positive and negative reviews (12,500 each), ensuring a balanced evaluation.
In conclusion, this model demonstrates excellent and balanced performance in sentiment analysis of movie reviews. Its high accuracy and consistent metrics across both positive and negative sentiments make it a reliable tool for classifying movie review sentiments.
Making Predictions
Finally, let's create a function to predict sentiment for new reviews:
def predict_sentiment(text):
encoded = tokenizer.encode_plus(
text,
add_special_tokens=True,
max_length=256,
padding='max_length',
truncation=True,
return_attention_mask=True,
return_tensors='pt'
)
input_ids = encoded['input_ids'].to(device)
attention_mask = encoded['attention_mask'].to(device)
with torch.no_grad():
outputs = model(input_ids, attention_mask=attention_mask)
pred = torch.argmax(outputs.logits, dim=1)
return "Positive" if pred.item() == 1 else "Negative"
# Example usage
# Test with new sentences
print(predict_sentiment("I love this movie!"))
print(predict_sentiment("This movie was terrible."))
Here is my GOOGLE COLAB Notebook if you want to copy and run the code
https://colab.research.google.com/drive/138HZtdJib-aOoldL4pvl3rLwx8_YsQ1D?usp=drive_link
Conclusion
In this blog, we've walked through the process of building a sentiment analysis model using BERT on the IMDB movie reviews dataset. This powerful approach leverages the pre-trained BERT model and fine-tunes it for our specific task, resulting in high accuracy in sentiment classification.
By following these steps, you can create your own sentiment analysis model and apply it to various text classification tasks. Remember that you can further improve the model by experimenting with different hyperparameters, using more advanced BERT variants, or incorporating additional features into your analysis.
Happy coding and sentiment analyzing!
Thanks
Sreeni Ramadorai
Top comments (0)