DEV Community

Cover image for Normalizing Sentiment Data: Google's Natural Language API
coltongraygg
coltongraygg

Posted on

Normalizing Sentiment Data: Google's Natural Language API

During a two-week sprint, I teamed up with two other developers to build an MVP for a journaling web app. One of our core features required analyzing users' journal entries to track emotional patterns over time, presenting this data through an interactive dashboard where users could visualize their emotional journey and gain insights into their mental well-being.

What is data normalization?

Before reviewing our implementation, let's understand data normalization and why it matters. Data normalization is the process of transforming numeric values to a standard scale while preserving the relative relationships between the original values. Consider it like converting measurements from different units (inches, centimeters, meters) into a standardized unit for fair comparison.

In our journaling application, we needed normalization because:

  1. Raw sentiment scores (-1 to 1) aren't intuitive for users
  2. Magnitude scores have range to Infinity
  3. Different users express emotions with varying intensities
  4. We wanted to present a precise 0-100 scale for emotional tracking

For our normalization needs, we chose min-max scaling, which transforms values to fit within a specified range using this formula:

normalized_value = (x - min) / (max - min)
Enter fullscreen mode Exit fullscreen mode

Min-max scaling is particularly suitable for sentiment analysis because it:

  • Preserves zero values
  • Maintains relationships between data points
  • Handles negative values effectively
  • Creates a range that's easy to understand

Through my initial research and documentation review, I knew we'd need to normalize the sentiment data from Google's Natural Language API to create meaningful visualizations. The API returns sentiment scores from -1 to 1 and a magnitude score. Here's how I implemented data normalization to create personalized emotional tracking.

Google's Natural Language API provides two key metrics:

  • Sentiment Score: Ranges from -1 (negative) to 1 (positive)

  • Magnitude: Measures the strength of emotion, starting from 0 to Infinity

We needed both metrics to paint a complete picture of a user's emotional state. Here's why:

// Initialize weights for our normalization
const sentimentWeight = 0.70;
const magnitudeWeight = 0.30;
Enter fullscreen mode Exit fullscreen mode

We chose a 70/30 split between sentiment and magnitude because the sentiment score directly indicates the emotional direction (positive/negative), while magnitude acts as an intensity modifier. This weighting precedes the emotion's direction while still accounting for its strength.

One key decision was to personalize the normalization based on each user's emotional expression patterns:

const getRanges = async (userId) => {
  // Default values from test data
  const defaults = {
    sentimentMin: -0.7,
    sentimentMax: 0.9,
    magnitudeMin: 0.5,
    magnitudeMax: 13,
  }
  // Initialize ranges with impossible values
  const ranges = {
    sentimentMin: 2,
    sentimentMax: -2,
    magnitudeMin: 101,
    magnitudeMax: 0,
  };

  try {
    const journals = await Journal.find({ userId });
    // Require minimum 10 entries for personalization
    if (journals.length < 10) {
      return defaults;
    }
    // Find min/max values from User's past entries
    journals.forEach(journal => {
      if (journal.sentimentScore < ranges.sentimentMin) {
        ranges.sentimentMin = journal.sentimentScore;
      }
      if (journal.sentimentScore > ranges.sentimentMax) {
        ranges.sentimentMax = journal.sentimentScore;
      }
      // etc...
    });
    return ranges;
  } catch (error) {
    return defaults;
  }
}

Enter fullscreen mode Exit fullscreen mode

We require at least 10 journal entries before using personalized ranges. This threshold ensures:

  • Statistical significance in calculating ranges

  • Protection against outlier entries skewing the normalization

  • Enough data points to establish meaningful patterns

Let's look at a concrete example using our code:

const normalizedSentiment = (sentiment - ranges.sentimentMin) / 
  (ranges.sentimentMax - ranges.sentimentMin);
Enter fullscreen mode Exit fullscreen mode

If a user's journal entry has:

  • Sentiment score: 0.3
  • User's minimum historical sentiment: -0.7
  • User's maximum historical sentiment: 0.9

The calculation would be:

normalizedSentiment = (0.3 - (-0.7)) / (0.9 - (-0.7))
                   = 1.0 / 1.6
                   = 0.625
Enter fullscreen mode Exit fullscreen mode

This transforms the original score of 0.3 to 0.625 on a 0-1 scale, which we then scale to our 0-100 range and combine with the normalized magnitude.

Here's our core normalization function:

const sentimentConverter = async (sentiment, magnitude, userId) => {


  try {
    // Get ranges for this user's sentimentScores in DB, if User hasn't submitted enough journal entries then our default ranges will be used.
    const ranges = await getRanges(userId);
    // Normalize data using min-max scaling (x - min) / (max - min)
    const normalizedSentiment = (sentiment - ranges.sentimentMin) / (ranges.sentimentMax - ranges.sentimentMin);
    const normalizedMagnitude = (magnitude - ranges.magnitudeMin) / (ranges.magnitudeMax - ranges.magnitudeMin);


    // Initialize weights and scale to 0-100
    const sentimentWeight = 0.70;
    const magnitudeWeight = 0.30;
    return Math.round((normalizedSentiment * sentimentWeight + normalizedMagnitude * magnitudeWeight) * 100)

  } catch (error) {
    console.error('Error converting sentiment values', error);

    // Resort to use default ranges on error - these values are based on our analysis of test data
    const normalizedSentiment = (sentiment - (-0.7)) / (0.9 - (-0.7));
    const normalizedMagnitude = (magnitude - 0.5) / (13 - 0.5);
    return Math.round((normalizedSentiment * 0.70 + normalizedMagnitude * 0.30) * 100);
  }
}
Enter fullscreen mode Exit fullscreen mode

This function:

  1. Retrieves personalized or default ranges

  2. Applies min-max scaling to both sentiment and magnitude

  3. Combines them using our weighted formula

  4. Scales to a 0-100 range for intuitive understanding

We made the deliberate choice to store both raw and normalized scores in our database:

const newEntry = new Journal({
    userId,
    normalizedSentiment: await sentimentConverter(
    sentiment.score, 
    sentiment.magnitude, 
    userId
  ),
  sentimentScore: sentiment.score,
  sentimentMagnitude: sentiment.magnitude,
});
Enter fullscreen mode Exit fullscreen mode

This decision enables:

  • Future algorithm improvements

  • Data analysis and optimization

  • Potential new features based on raw data analysis


After our two-week sprint, I'm particularly proud of the data normalization implementation and how it enables meaningful emotional tracking for our users. The decision to store raw and normalized sentiment data positions us well for future optimizations, and requiring 10+ entries before personalizing ranges ensures reliable insights as users build their journaling habit.

Our team built a solid foundation for emotional pattern recognition and visualization. While the code is functioning well, there's always room for refinement. As we prepare to launch and grow our user base, I'm excited about the potential improvements we could make:

  • Fine-tuning our weighting algorithm based on user feedback

  • Adding more sophisticated pattern recognition

  • Expanding our emotional analytics features

The journey from raw sentiment scores to meaningful emotional insights was challenging but rewarding. I look forward to seeing how users interact with these features and using their experiences to guide our future development. If you're working on similar data normalization challenges, I hope this walkthrough of our implementation helps inform your approach.

Top comments (0)