DEV Community

Cover image for SpeechCraft: AI-Powered Speech Analysis for Better Communication
Karol O
Karol O

Posted on

SpeechCraft: AI-Powered Speech Analysis for Better Communication

This is a submission for the AssemblyAI Challenge : Really Rad Real-Time.

What I Built

SpeechCraft 🎙️ - Real-time Speech Analytics Platform

Overview

SpeechCraft is an advanced real-time speech analytics platform that transforms spoken words into actionable insights. Using cutting-edge AI technology from AssemblyAI, it provides instant transcription while analyzing multiple dimensions of speech performance.

Key Features

1. Real-Time Transcription 📝

  • Instant speech-to-text conversion
  • High-accuracy transcription
  • Support for natural conversation flow

2. Advanced Speech Metrics 📊

Speaking Pace Analysis

  • Real-time words-per-minute tracking
  • Optimal pace guidance
  • Speed variation detection

Clarity Measurement

  • Filler word detection
  • Sentence structure analysis
  • Pronunciation clarity scoring

Fluency Assessment

  • Speech flow analysis
  • Transition word usage tracking
  • Pause pattern analysis

Speech Rhythm

  • Sentence length variation
  • Speaking pattern analysis
  • Rhythm consistency scoring

Vocabulary Analysis

  • Word variety measurement
  • Complex word usage tracking
  • Vocabulary richness scoring

3. Visual Analytics 📈

  • Real-time metric visualization
  • Progress tracking
  • Performance trend analysis

Applications

Public Speaking

  • Speech practice and improvement
  • Real-time feedback
  • Performance analytics

Education

  • Language learning assistance
  • Speaking skill development
  • Pronunciation training

Professional Development

  • Presentation skills enhancement
  • Communication training
  • Interview preparation

Content Creation

  • Podcast transcription
  • Video content analysis
  • Speech quality improvement

Benefits

For Users

  • Instant feedback on speaking performance
  • Comprehensive speech analytics
  • Objective performance metrics
  • Personal development tracking

For Organizations

  • Communication skills training
  • Quality assurance for speakers
  • Standardized assessment tools
  • Data-driven improvement strategies

Future Enhancements

  1. Advanced sentiment analysis
  2. Multi-language support
  3. Custom metric configuration
  4. Speech pattern recognition
  5. Integration with learning management systems

Impact

SpeechCraft represents a significant advancement in speech analytics technology, providing users with powerful tools for improving their communication skills through real-time feedback and comprehensive analysis.

Demo

https://speechcraft.onrender.com/

Image description

Journey

Core Implementation 🚀

  1. Server-Side Token Management
// Secure proxy server for token generation
app.get('/get-token', async (req, res) => {
    const response = await fetch('https://api.assemblyai.com/v2/realtime/token', {
        method: 'POST',
        headers: { 'Authorization': ASSEMBLY_AI_TOKEN }
    });
    res.json(await response.json());
});
Enter fullscreen mode Exit fullscreen mode
  1. Real-time Audio Processing Pipeline
// Audio capture with optimized settings
const stream = await navigator.mediaDevices.getUserMedia({ 
    audio: {
        channelCount: 1,
        sampleRate: 16000,
        echoCancellation: true
    }
});

// WebSocket connection for real-time streaming
wsRef.current = new WebSocket(`wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000&token=${token}`);

// Send audio chunks every 250ms
mediaRecorder.ondataavailable = async (event) => {
    if (event.data.size > 0) {
        const base64Audio = await convertToBase64(event.data);
        wsRef.current.send(JSON.stringify({ audio_data: base64Audio }));
    }
};
Enter fullscreen mode Exit fullscreen mode
  1. Real-time Transcript Processing
wsRef.current.onmessage = (message) => {
    const data = JSON.parse(message.data);
    if (data.message_type === 'FinalTranscript') {
        updateTranscription(data.text);
        updateMetrics(data.text);
    }
};
Enter fullscreen mode Exit fullscreen mode

Key Features ⚡

  • Real-time audio streaming with optimized chunk size (250ms)
  • Secure WebSocket connection with token authentication
  • Automatic audio format handling
  • Error recovery and reconnection logic
  • Resource cleanup and memory management

Technical Highlights 🔧

  • Sample rate: 16kHz mono audio
  • WebSocket protocol for low-latency communication
  • Base64 encoding for efficient data transmission
  • Automatic handling of partial and final transcripts
  • Integration with React state management

Credits:

Solution has been proudly provided by binarygarage.dev using assemblyai.com. For further information please contact contact@binarygarage.dev

Top comments (0)