During a two-week sprint, I teamed up with two other developers to build an MVP for a journaling web app. One of our core features required analyzing users' journal entries to track emotional patterns over time, presenting this data through an interactive dashboard where users could visualize their emotional journey and gain insights into their mental well-being.
What is data normalization?
Before reviewing our implementation, let's understand data normalization and why it matters. Data normalization is the process of transforming numeric values to a standard scale while preserving the relative relationships between the original values. Consider it like converting measurements from different units (inches, centimeters, meters) into a standardized unit for fair comparison.
In our journaling application, we needed normalization because:
- Raw sentiment scores (-1 to 1) aren't intuitive for users
- Magnitude scores have range to Infinity
- Different users express emotions with varying intensities
- We wanted to present a precise 0-100 scale for emotional tracking
For our normalization needs, we chose min-max scaling, which transforms values to fit within a specified range using this formula:
normalized_value = (x - min) / (max - min)
Min-max scaling is particularly suitable for sentiment analysis because it:
- Preserves zero values
- Maintains relationships between data points
- Handles negative values effectively
- Creates a range that's easy to understand
Through my initial research and documentation review, I knew we'd need to normalize the sentiment data from Google's Natural Language API to create meaningful visualizations. The API returns sentiment scores from -1 to 1 and a magnitude score. Here's how I implemented data normalization to create personalized emotional tracking.
Google's Natural Language API provides two key metrics:
Sentiment Score: Ranges from -1 (negative) to 1 (positive)
Magnitude: Measures the strength of emotion, starting from 0 to Infinity
We needed both metrics to paint a complete picture of a user's emotional state. Here's why:
// Initialize weights for our normalization
const sentimentWeight = 0.70;
const magnitudeWeight = 0.30;
We chose a 70/30 split between sentiment and magnitude because the sentiment score directly indicates the emotional direction (positive/negative), while magnitude acts as an intensity modifier. This weighting precedes the emotion's direction while still accounting for its strength.
One key decision was to personalize the normalization based on each user's emotional expression patterns:
const getRanges = async (userId) => {
// Default values from test data
const defaults = {
sentimentMin: -0.7,
sentimentMax: 0.9,
magnitudeMin: 0.5,
magnitudeMax: 13,
}
// Initialize ranges with impossible values
const ranges = {
sentimentMin: 2,
sentimentMax: -2,
magnitudeMin: 101,
magnitudeMax: 0,
};
try {
const journals = await Journal.find({ userId });
// Require minimum 10 entries for personalization
if (journals.length < 10) {
return defaults;
}
// Find min/max values from User's past entries
journals.forEach(journal => {
if (journal.sentimentScore < ranges.sentimentMin) {
ranges.sentimentMin = journal.sentimentScore;
}
if (journal.sentimentScore > ranges.sentimentMax) {
ranges.sentimentMax = journal.sentimentScore;
}
// etc...
});
return ranges;
} catch (error) {
return defaults;
}
}
We require at least 10 journal entries before using personalized ranges. This threshold ensures:
Statistical significance in calculating ranges
Protection against outlier entries skewing the normalization
Enough data points to establish meaningful patterns
Let's look at a concrete example using our code:
const normalizedSentiment = (sentiment - ranges.sentimentMin) /
(ranges.sentimentMax - ranges.sentimentMin);
If a user's journal entry has:
- Sentiment score: 0.3
- User's minimum historical sentiment: -0.7
- User's maximum historical sentiment: 0.9
The calculation would be:
normalizedSentiment = (0.3 - (-0.7)) / (0.9 - (-0.7))
= 1.0 / 1.6
= 0.625
This transforms the original score of 0.3 to 0.625 on a 0-1 scale, which we then scale to our 0-100 range and combine with the normalized magnitude.
Here's our core normalization function:
const sentimentConverter = async (sentiment, magnitude, userId) => {
try {
// Get ranges for this user's sentimentScores in DB, if User hasn't submitted enough journal entries then our default ranges will be used.
const ranges = await getRanges(userId);
// Normalize data using min-max scaling (x - min) / (max - min)
const normalizedSentiment = (sentiment - ranges.sentimentMin) / (ranges.sentimentMax - ranges.sentimentMin);
const normalizedMagnitude = (magnitude - ranges.magnitudeMin) / (ranges.magnitudeMax - ranges.magnitudeMin);
// Initialize weights and scale to 0-100
const sentimentWeight = 0.70;
const magnitudeWeight = 0.30;
return Math.round((normalizedSentiment * sentimentWeight + normalizedMagnitude * magnitudeWeight) * 100)
} catch (error) {
console.error('Error converting sentiment values', error);
// Resort to use default ranges on error - these values are based on our analysis of test data
const normalizedSentiment = (sentiment - (-0.7)) / (0.9 - (-0.7));
const normalizedMagnitude = (magnitude - 0.5) / (13 - 0.5);
return Math.round((normalizedSentiment * 0.70 + normalizedMagnitude * 0.30) * 100);
}
}
This function:
Retrieves personalized or default ranges
Applies min-max scaling to both sentiment and magnitude
Combines them using our weighted formula
Scales to a 0-100 range for intuitive understanding
We made the deliberate choice to store both raw and normalized scores in our database:
const newEntry = new Journal({
userId,
normalizedSentiment: await sentimentConverter(
sentiment.score,
sentiment.magnitude,
userId
),
sentimentScore: sentiment.score,
sentimentMagnitude: sentiment.magnitude,
});
This decision enables:
Future algorithm improvements
Data analysis and optimization
Potential new features based on raw data analysis
After our two-week sprint, I'm particularly proud of the data normalization implementation and how it enables meaningful emotional tracking for our users. The decision to store raw and normalized sentiment data positions us well for future optimizations, and requiring 10+ entries before personalizing ranges ensures reliable insights as users build their journaling habit.
Our team built a solid foundation for emotional pattern recognition and visualization. While the code is functioning well, there's always room for refinement. As we prepare to launch and grow our user base, I'm excited about the potential improvements we could make:
Fine-tuning our weighting algorithm based on user feedback
Adding more sophisticated pattern recognition
Expanding our emotional analytics features
The journey from raw sentiment scores to meaningful emotional insights was challenging but rewarding. I look forward to seeing how users interact with these features and using their experiences to guide our future development. If you're working on similar data normalization challenges, I hope this walkthrough of our implementation helps inform your approach.
Top comments (0)