DEV Community

Cover image for From Pain to Gain: Building a Browser-Based Log Analysis Tool
ka li
ka li

Posted on

From Pain to Gain: Building a Browser-Based Log Analysis Tool

After countless frozen editors and crashed terminals while analyzing mobile app logs, I built something different. Here's the journey.

The Challenge

As mobile developers, these scenarios are all too familiar:

  • 500MB log files freezing our editors
  • Nested zip files from QA teams
  • Multiple keywords to track simultaneously
  • What should be quick analysis turning into hours of frustration

Browser to the Rescue

Instead of fighting with traditional tools, I explored modern browser capabilities:

  • File System Access API for large files
  • Web Workers for background processing
  • IndexedDB for efficient data storage

Smart Multi-Pattern Highlighting

One of the key challenges in log analysis is highlighting multiple patterns simultaneously. Traditional approaches often fail when dealing with overlapping patterns or nested highlights. Here's how we solved it:

// Smart highlighting with style stacking
function highlightText(content, patterns) {
  const events = [];

  // Collect all matches and their styles
  // Traditional approach of replacing patterns one by one would break
  // when patterns overlap, as earlier highlights would be treated as content
  patterns.forEach(({pattern, style}) => {
    const regex = new RegExp(pattern, 'gi');
    let match;
    while ((match = regex.exec(content)) !== null) {
      events.push(
        { pos: match.index, type: 'start', style },
        { pos: regex.lastIndex, type: 'end', style }
      );
    }
  });

  // Sort events by position and type
  // Critical: end events must come before start events at same position
  // to properly handle cases like: |end1|end2|start3|start4|
  events.sort((a, b) => {
    if (a.pos !== b.pos) return a.pos - b.pos;
    return a.type === 'end' ? -1 : 1;
  });

  // Apply styles with stacking support
  // Using a stack-like structure (activeStyles) to track all active styles
  // at any given position, enabling proper style inheritance
  let activeStyles = [];
  let result = '';
  let lastPos = 0;

  events.forEach(event => {
    const segment = content.slice(lastPos, event.pos);
    if (segment) {
      // Merge all active styles for overlapping regions
      // This ensures all applicable styles are combined correctly
      const styles = activeStyles.reduce((acc, style) => ({...acc, ...style}), {});
      result += activeStyles.length 
        ? `<span style="${styleToString(styles)}">${segment}</span>`
        : segment;
    }

    // Maintain the stack of active styles
    if (event.type === 'start') activeStyles.push(event.style);
    else activeStyles = activeStyles.filter(s => s !== event.style);

    lastPos = event.pos;
  });

  return result;
}
Enter fullscreen mode Exit fullscreen mode

Why This Approach?

Traditional highlighting methods have several limitations:

  1. Sequential Processing Problem

    • Traditional: Process patterns one after another, replacing text with HTML
    • Issue: Earlier highlights become part of the content, breaking later patterns
    • Our Solution: Track all matches first, then process them together
  2. Style Stacking Challenge

    • Traditional: Can't properly handle overlapping highlights
    • Issue: Later highlights override earlier ones
    • Our Solution: Stack styles in overlapping regions
  3. Performance Concerns

    • Traditional: Multiple passes through the text, each replacing content
    • Issue: O(n*m) complexity where n is text length and m is pattern count
    • Our Solution: Single pass collection, single pass rendering

This implementation ensures:

  • Correct handling of overlapping patterns
  • Proper style inheritance in overlapped regions
  • Maintainable and efficient processing
  • Safe HTML output without breaking existing tags

The Virtual List Challenge

While developing LogDog, I encountered an interesting limitation in existing virtual list solutions. After testing more than 10 popular virtual list components, I found they all hit a wall when dealing with truly massive datasets - typically around 2^24 rows (about 16.7 million entries).

The root cause? Most virtual list implementations rely on the browser's native scrollbar and require the entire dataset upfront as an array. This approach fails spectacularly when analyzing gigabyte-sized log files.

I've implemented a different approach that breaks through this limitation. While the complete solution deserves its own article (coming soon!), the key insight was reimagining how virtual lists handle scrolling and data sourcing.

Stay tuned for a deep dive into:

  • Breaking the 2^24 limit
  • Custom scrolling implementation
  • Efficient data streaming
  • Memory management techniques

Results

In real-world usage:

  • Load time: <1s for most files
  • Memory usage: ~100MB for 1GB log file
  • Search speed: Real-time for most patterns

Key Learnings

The development process taught me that:

  1. Modern browsers are surprisingly capable
  2. Proper chunking is crucial for large file handling
  3. UI responsiveness requires careful architecture
  4. Smart memory management makes all the difference

Try It Yourself

I've made this tool available online:

What's Next?

I'm working on:

  • More built-in coloring rules for various analysis tasks
  • Performance optimization patterns
  • Latency analysis templates

Share your log analysis challenges in the comments! What patterns would you find most useful for your analysis tasks?

If you found this helpful, give LogDog a try and let me know your thoughts!

Top comments (0)