BERT: The NLP Rockstar Revolutionizing Language Understanding

Hey there, language enthusiasts! 🌟 Let’s dive into the world of BERT, the superstar of Natural Language Processing (NLP) that’s been shaking things up since its debut in 2018. If you’re a fan of language models, you’ve probably heard of BERT—Bidirectional Encoder Representations from Transformers. It’s Google's brainchild that’s changing the game in how machines understand language. But what makes BERT so special compared to older models like Word2Vec and GloVe? Let’s break it down in a fun and easy-to-understand way!

Word Embedding using Word2Vec, FastText and Glove.

Context is King 👑

Here’s the thing: BERT isn’t like your standard word model that treats every word the same, no matter the situation. Imagine you're talking about "Apple." Are you craving a juicy fruit or eyeing the latest iPhone? Word2Vec and GloVe would give you the same representation of the word apple every time, regardless of its context. But BERT? It gets the context. So when you say:

"I like Apple phones."
"I like apple fruits."

BERT understands that "Apple" in the first sentence is tech-related, and in the second, it’s all about the fruit. BERT dynamically creates different embeddings based on how the word is used in a sentence. It's context-aware, and that’s what sets it apart from static models.

The Secret Sauce: Bidirectional Learning 🔄

Here’s where BERT gets really cool. While most models read sentences in one direction (left to right or right to left), BERT does something a bit more magical: it looks at sentences both ways! Yes, you heard that right—BERT reads from left to right AND right to left.

Think of it like having eyes in the back of your head, but for words. This bidirectional view helps BERT capture the true relationships between words in a sentence and understand the context much more deeply. No more one-dimensional thinking! 🌐

MLM: BERT's Training Gym 💪

Now, let’s talk about MLM—Masked Language Modeling, which is BERT's workout routine. During training, BERT is given sentences with some words masked out, and BERT’s job is to guess the missing word. It’s like playing a game of Mad Libs, but the stakes are higher. By doing this, BERT learns to understand the surrounding context and relationships between words, improving its language skills.

Getting Tensors in PyTorch format

import torch
from transformers import BertForMaskedLM, BertTokenizer
# Load pre-trained BERT model for MLM and tokenizer
# Here  Uncased means the Model treat Sreeni ==sreeni :) or Apple ==apple
model_mlm = BertForMaskedLM.from_pretrained('bert-base-uncased')
tokenizer_mlm = BertTokenizer.from_pretrained('bert-base-uncased')

# here the list of  sentences with Mask word
sentences = [
    "[MASK] is the best way to learn a new language.",
    "She [MASK] the piano beautifully during the recital.",
    "They bought a new [MASK] for their living room.",
    "I love hiking in the [MASK] mountains during the summer.",
    "We plan to meet at [MASK] PM for dinner.",
    "The [MASK] dog ran through the park chasing after the ball.",
    "I enjoy skiing in the [MASK] during the holidays.",
    "It’s [MASK] book that was left on the table.",
    "[MASK] do you like to do on the weekend?",
    "She wore a [MASK] dress to the party."
]
# Tokenize input and find the index of the masked token
for sentence in sentences:
    inputs = tokenizer_mlm(sentence, return_tensors="pt")
    mask_token_index = torch.where(inputs.input_ids == tokenizer_mlm.mask_token_id)[1]
    outputs = model_mlm(**inputs)
    predictions = outputs.logits

    # Get the predicted token (highest probability)
    predicted_token_id = predictions[0, mask_token_index, :].argmax(axis=-1)
    predicted_token = tokenizer_mlm.decode(predicted_token_id)
    # I am printing both Masked word Sentence along with Predicted word 
    print(f"{sentence} and Predicted word: {predicted_token}", end="\n")

Getting Tensors in TensorFlow format


from transformers import TFBertForMaskedLM, BertTokenizer
import tensorflow as tf

# Load pre-trained BERT model for MLM and tokenizer
model_mlm = TFBertForMaskedLM.from_pretrained('bert-base-uncased')
tokenizer_mlm = BertTokenizer.from_pretrained('bert-base-uncased')

# Here is the list of sentences with a masked word
sentences = [
    "[MASK] is the best way to learn a new language.",
    "She [MASK] the piano beautifully during the recital.",
    "They bought a new [MASK] for their living room.",
    "I love hiking in the [MASK] mountains during the summer.",
    "We plan to meet at [MASK] PM for dinner.",
    "The [MASK] dog ran through the park chasing after the ball.",
    "I enjoy skiing in the [MASK] during the holidays.",
    "It’s [MASK] book that was left on the table.",
    "[MASK] do you like to do on the weekend?",
    "She wore a [MASK] dress to the party."
]

# Tokenize input and find the index of the masked token
for sentence in sentences:
    # Tokenizing the sentence
    inputs = tokenizer_mlm(sentence, return_tensors="tf")

    # Get the index of the masked token
    mask_token_index = tf.where(inputs['input_ids'] == tokenizer_mlm.mask_token_id)[0][1]

    # Forward pass to get predictions
    outputs = model_mlm(inputs)
    predictions = outputs.logits

    # Get the predicted token (highest probability)
    predicted_token_id = tf.argmax(predictions[0, mask_token_index, :], axis=-1)
    predicted_token = tokenizer_mlm.decode(predicted_token_id.numpy())

    # Print both the original sentence and the predicted word
    print(f"{sentence} and Predicted word: {predicted_token}", end="\n")

Note: The tokenizer_mlm(sentence, return_tensors="tf") performs the tokenization of the input sentence into the format that is compatible with TensorFlow.

Output

NSP: BERT’s Social Skills Training 🗣️

NSP stands for Next Sentence Prediction, and it’s another cool trick BERT uses to enhance its understanding of language. In NSP, BERT is trained to predict whether one sentence logically follows another. This helps BERT understand how sentences flow and how they’re connected to each other—basically, it’s BERT’s crash course in conversation skills. Imagine BERT learning to understand not just individual words, but how sentences fit together like pieces of a puzzle!

Why BERT is the NLP Rockstar 🌟

BERT’s context-aware abilities make it an absolute rockstar in various NLP tasks like question answering, text classification, and named entity recognition. While older models might get tripped up by the nuances of language, BERT understands the full context. That means it can generate more accurate, nuanced responses and insights in a way that feels much closer to how humans think.

So, there you have it! BERT is not just another language model—it’s a game-changer. Its ability to understand the full context and its bidirectional approach make it incredibly effective for solving complex NLP challenges. Thanks to BERT, we’re one step closer to machines that really understand language, and maybe, just maybe, one day, you’ll have a deep conversation with your AI assistant about whether to pick an apple (the fruit) or an Apple (the tech).

Until then keep playing NLP tracks.

Thanks
Sreeni Ramadorai