Skip to content

DEV Community

Vicente G. Reyes

Posted on Nov 22, 2024 • Edited on Dec 25, 2024

Speech to Musical Notation with AssemblyAI

#devchallenge #assemblyaichallenge #ai #api

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

I built Speech-to-Note, an innovative web application that combines speech recognition and musical note detection. The application allows users to record audio (either speech or singing) and processes it in two ways:

Converts spoken words into text using AssemblyAI's Speech-to-Text API
Analyzes the audio to detect musical notes, including their pitch, octave, and duration

The application features a modern, responsive UI built with React and TailwindCSS, and a robust backend powered by FastAPI. It's particularly useful for musicians, music teachers, and anyone interested in analyzing the musical properties of their voice or instruments.

Demo

Link to site https://speech.vicentereyes.org/
GitHub:

reyesvicente / speech-to-music-note

Speech to Musical Notes Converter

This application converts spoken words into musical notes using FastAPI, React, and AssemblyAI.

Prerequisites

Python 3.8+
Node.js and npm
AssemblyAI API key

Setup

Clone the repository

Set up the backend:

# Install Python dependencies
pip install -r requirements.txt

# Set up your AssemblyAI API key in .env file
# Replace 'your_api_key_here' with your actual API key

Set up the frontend:
```
cd frontend
npm install
```

Running the Application

Start the backend server:
```
uvicorn main:app --reload
```
Start the frontend development server:
```
cd frontend
npm run dev
```
Open your browser and navigate to the URL shown in the frontend terminal output (usually http://localhost:5173)

Usage

Click the "Start Recording" button to begin recording audio
Speak into your microphone
Click "Stop Recording" when finished
Click "Process Audio" to send the recording to the server
The transcribed text will appear below

Features

Audio recording using the Web Audio API
Real-time…

Speech to note demo

Vidyard video

favicon

share.vidyard.com

Landing Page

Audio Processing

Result

Journey

AssemblyAI's Universal-2 Speech-to-Text Model was integrated into the application through their Python SDK. The implementation can be found in the upload_audio endpoint of our FastAPI backend:

When a user records audio, it's sent to our backend as a WAV file
The audio file is processed in parallel:
- Sent to AssemblyAI's API for transcription
- Analyzed locally using librosa for musical note detection
The transcribed text and detected musical notes are returned to the frontend

The AssemblyAI integration was straightforward thanks to their well-documented SDK:

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_file_path)
transcribed_text = transcript.text

What makes this implementation sophisticated is the dual-processing approach:

Using AssemblyAI's advanced speech recognition for accurate text transcription
Complementing it with custom pitch detection algorithms to extract musical information
Providing a synchronized playback experience where users can hear the detected notes while seeing the transcribed text

This creates a unique tool that bridges the gap between spoken word and musical notation, making it valuable for various musical applications, from education to composition.

The application qualifies for additional prompts as it implements:

Real-time audio processing
Custom pitch detection algorithms
Interactive audio playback
Modern, responsive UI with TailwindCSS
Full-stack implementation with React and FastAPI

The project demonstrates how AssemblyAI's technology can be combined with custom audio processing to create innovative applications that go beyond simple speech-to-text conversion.

Top comments (4)

Subscribe

Jess Lee • Dec 5 '24

This was a very very cool project.

Vicente G. Reyes • Dec 6 '24

Thanks, Jess!

Meredith Rauch • Jan 2

Awesome application! The Assembly team loved seeing the innovation in this project.

Vicente G. Reyes • Jan 3 • Edited

Awesome! Glad the AssemblyAI team liked it!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

Read next

Tried Phi-4, It didn't Impress

Maxim Saplin - Dec 18 '24

AI Engineer's Review: Poe - Platform for accessing various AI models like Llama, GPT, Claude

shubhanshu - Dec 18 '24

Cursor AI: 5 Advanced Features You're Not Using

Vishwas - Dec 18 '24

How to get API Key from Firecrawl

Ibrohim Abdivokhidov - Dec 18 '24