coltongraygg

Posted on Jan 13

Normalizing Sentiment Data: Google's Natural Language API

#sentimentanalysis #datanormalization #languageanalysis #google

During a two-week sprint, I teamed up with two other developers to build an MVP for a journaling web app. One of our core features required analyzing users' journal entries to track emotional patterns over time, presenting this data through an interactive dashboard where users could visualize their emotional journey and gain insights into their mental well-being.

What is data normalization?

Before reviewing our implementation, let's understand data normalization and why it matters. Data normalization is the process of transforming numeric values to a standard scale while preserving the relative relationships between the original values. Consider it like converting measurements from different units (inches, centimeters, meters) into a standardized unit for fair comparison.

In our journaling application, we needed normalization because:

Raw sentiment scores (-1 to 1) aren't intuitive for users
Magnitude scores have range to Infinity
Different users express emotions with varying intensities
We wanted to present a precise 0-100 scale for emotional tracking

For our normalization needs, we chose min-max scaling, which transforms values to fit within a specified range using this formula:

normalized_value = (x - min) / (max - min)

Min-max scaling is particularly suitable for sentiment analysis because it:

Preserves zero values
Maintains relationships between data points
Handles negative values effectively
Creates a range that's easy to understand

Through my initial research and documentation review, I knew we'd need to normalize the sentiment data from Google's Natural Language API to create meaningful visualizations. The API returns sentiment scores from -1 to 1 and a magnitude score. Here's how I implemented data normalization to create personalized emotional tracking.

Google's Natural Language API provides two key metrics:

Sentiment Score: Ranges from -1 (negative) to 1 (positive)
Magnitude: Measures the strength of emotion, starting from 0 to Infinity

We needed both metrics to paint a complete picture of a user's emotional state. Here's why:

// Initialize weights for our normalization
const sentimentWeight = 0.70;
const magnitudeWeight = 0.30;

We chose a 70/30 split between sentiment and magnitude because the sentiment score directly indicates the emotional direction (positive/negative), while magnitude acts as an intensity modifier. This weighting precedes the emotion's direction while still accounting for its strength.

One key decision was to personalize the normalization based on each user's emotional expression patterns:

const getRanges = async (userId) => {
  // Default values from test data
  const defaults = {
    sentimentMin: -0.7,
    sentimentMax: 0.9,
    magnitudeMin: 0.5,
    magnitudeMax: 13,
  }
  // Initialize ranges with impossible values
  const ranges = {
    sentimentMin: 2,
    sentimentMax: -2,
    magnitudeMin: 101,
    magnitudeMax: 0,
  };

  try {
    const journals = await Journal.find({ userId });
    // Require minimum 10 entries for personalization
    if (journals.length < 10) {
      return defaults;
    }
    // Find min/max values from User's past entries
    journals.forEach(journal => {
      if (journal.sentimentScore < ranges.sentimentMin) {
        ranges.sentimentMin = journal.sentimentScore;
      }
      if (journal.sentimentScore > ranges.sentimentMax) {
        ranges.sentimentMax = journal.sentimentScore;
      }
      // etc...
    });
    return ranges;
  } catch (error) {
    return defaults;
  }
}

We require at least 10 journal entries before using personalized ranges. This threshold ensures:

Statistical significance in calculating ranges
Protection against outlier entries skewing the normalization
Enough data points to establish meaningful patterns

Let's look at a concrete example using our code:

const normalizedSentiment = (sentiment - ranges.sentimentMin) / 
  (ranges.sentimentMax - ranges.sentimentMin);

If a user's journal entry has:

Sentiment score: 0.3
User's minimum historical sentiment: -0.7
User's maximum historical sentiment: 0.9

The calculation would be:

normalizedSentiment = (0.3 - (-0.7)) / (0.9 - (-0.7))
                   = 1.0 / 1.6
                   = 0.625

This transforms the original score of 0.3 to 0.625 on a 0-1 scale, which we then scale to our 0-100 range and combine with the normalized magnitude.

Here's our core normalization function:

const sentimentConverter = async (sentiment, magnitude, userId) => {


  try {
    // Get ranges for this user's sentimentScores in DB, if User hasn't submitted enough journal entries then our default ranges will be used.
    const ranges = await getRanges(userId);
    // Normalize data using min-max scaling (x - min) / (max - min)
    const normalizedSentiment = (sentiment - ranges.sentimentMin) / (ranges.sentimentMax - ranges.sentimentMin);
    const normalizedMagnitude = (magnitude - ranges.magnitudeMin) / (ranges.magnitudeMax - ranges.magnitudeMin);


    // Initialize weights and scale to 0-100
    const sentimentWeight = 0.70;
    const magnitudeWeight = 0.30;
    return Math.round((normalizedSentiment * sentimentWeight + normalizedMagnitude * magnitudeWeight) * 100)

  } catch (error) {
    console.error('Error converting sentiment values', error);

    // Resort to use default ranges on error - these values are based on our analysis of test data
    const normalizedSentiment = (sentiment - (-0.7)) / (0.9 - (-0.7));
    const normalizedMagnitude = (magnitude - 0.5) / (13 - 0.5);
    return Math.round((normalizedSentiment * 0.70 + normalizedMagnitude * 0.30) * 100);
  }
}

This function:

Retrieves personalized or default ranges
Applies min-max scaling to both sentiment and magnitude
Combines them using our weighted formula
Scales to a 0-100 range for intuitive understanding

We made the deliberate choice to store both raw and normalized scores in our database:

const newEntry = new Journal({
    userId,
    normalizedSentiment: await sentimentConverter(
    sentiment.score, 
    sentiment.magnitude, 
    userId
  ),
  sentimentScore: sentiment.score,
  sentimentMagnitude: sentiment.magnitude,
});

This decision enables:

Future algorithm improvements
Data analysis and optimization
Potential new features based on raw data analysis

After our two-week sprint, I'm particularly proud of the data normalization implementation and how it enables meaningful emotional tracking for our users. The decision to store raw and normalized sentiment data positions us well for future optimizations, and requiring 10+ entries before personalizing ranges ensures reliable insights as users build their journaling habit.

Our team built a solid foundation for emotional pattern recognition and visualization. While the code is functioning well, there's always room for refinement. As we prepare to launch and grow our user base, I'm excited about the potential improvements we could make:

Fine-tuning our weighting algorithm based on user feedback
Adding more sophisticated pattern recognition
Expanding our emotional analytics features

The journey from raw sentiment scores to meaningful emotional insights was challenging but rewarding. I look forward to seeing how users interact with these features and using their experiences to guide our future development. If you're working on similar data normalization challenges, I hope this walkthrough of our implementation helps inform your approach.

DEV Community

Normalizing Sentiment Data: Google's Natural Language API

What is data normalization?

Top comments (0)

Read next

How to check for null, undefined, or empty values in JavaScript

Cloudflare Turnstile NextJS: Invalid Token Error on Repeated Submissions

Using ML and AI for stock price prediction

🗞 Rapyd Developer Newsletter: February 2025 💰 Machine Learning, Challenge Disputes, Vertical SaaS, and Stablecoin