BinaryGarge.dev

Posted on Nov 24, 2024

SpeechCraft: AI-Powered Speech Analysis for Better Communication

#devchallenge #assemblyaichallenge #ai #api

This is a submission for the AssemblyAI Challenge : Really Rad Real-Time.

What I Built

SpeechCraft 🎙️ - Real-time Speech Analytics Platform

Overview

SpeechCraft is an advanced real-time speech analytics platform that transforms spoken words into actionable insights. Using cutting-edge AI technology from AssemblyAI, it provides instant transcription while analyzing multiple dimensions of speech performance.

Key Features

1. Real-Time Transcription 📝

Instant speech-to-text conversion
High-accuracy transcription
Support for natural conversation flow

2. Advanced Speech Metrics 📊

Speaking Pace Analysis

Real-time words-per-minute tracking
Optimal pace guidance
Speed variation detection

Clarity Measurement

Filler word detection
Sentence structure analysis
Pronunciation clarity scoring

Fluency Assessment

Speech flow analysis
Transition word usage tracking
Pause pattern analysis

Speech Rhythm

Sentence length variation
Speaking pattern analysis
Rhythm consistency scoring

Vocabulary Analysis

Word variety measurement
Complex word usage tracking
Vocabulary richness scoring

3. Visual Analytics 📈

Real-time metric visualization
Progress tracking
Performance trend analysis

Applications

Public Speaking

Speech practice and improvement
Real-time feedback
Performance analytics

Education

Language learning assistance
Speaking skill development
Pronunciation training

Professional Development

Presentation skills enhancement
Communication training
Interview preparation

Content Creation

Podcast transcription
Video content analysis
Speech quality improvement

Benefits

For Users

Instant feedback on speaking performance
Comprehensive speech analytics
Objective performance metrics
Personal development tracking

For Organizations

Communication skills training
Quality assurance for speakers
Standardized assessment tools
Data-driven improvement strategies

Future Enhancements

Advanced sentiment analysis
Multi-language support
Custom metric configuration
Speech pattern recognition
Integration with learning management systems

Impact

SpeechCraft represents a significant advancement in speech analytics technology, providing users with powerful tools for improving their communication skills through real-time feedback and comprehensive analysis.

Demo

https://speechcraft.onrender.com/

Journey

Core Implementation 🚀

Server-Side Token Management

// Secure proxy server for token generation
app.get('/get-token', async (req, res) => {
    const response = await fetch('https://api.assemblyai.com/v2/realtime/token', {
        method: 'POST',
        headers: { 'Authorization': ASSEMBLY_AI_TOKEN }
    });
    res.json(await response.json());
});

Real-time Audio Processing Pipeline

// Audio capture with optimized settings
const stream = await navigator.mediaDevices.getUserMedia({ 
    audio: {
        channelCount: 1,
        sampleRate: 16000,
        echoCancellation: true
    }
});

// WebSocket connection for real-time streaming
wsRef.current = new WebSocket(`wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000&token=${token}`);

// Send audio chunks every 250ms
mediaRecorder.ondataavailable = async (event) => {
    if (event.data.size > 0) {
        const base64Audio = await convertToBase64(event.data);
        wsRef.current.send(JSON.stringify({ audio_data: base64Audio }));
    }
};

Real-time Transcript Processing

wsRef.current.onmessage = (message) => {
    const data = JSON.parse(message.data);
    if (data.message_type === 'FinalTranscript') {
        updateTranscription(data.text);
        updateMetrics(data.text);
    }
};

Key Features ⚡

Real-time audio streaming with optimized chunk size (250ms)
Secure WebSocket connection with token authentication
Automatic audio format handling
Error recovery and reconnection logic
Resource cleanup and memory management

Technical Highlights 🔧

Sample rate: 16kHz mono audio
WebSocket protocol for low-latency communication
Base64 encoding for efficient data transmission
Automatic handling of partial and final transcripts
Integration with React state management

Credits:

Solution has been proudly provided by binarygarage.dev using assemblyai.com. For further information please contact contact@binarygarage.dev

DEV Community