Skip to content

DEV Community

Chijioke Osadebe

Posted on Nov 25, 2024

RepodAI: An AI-Powered Podcasting Platform with Transcription, Summarization, and Interactive Features 🎙️

#devchallenge #assemblyaichallenge #ai #api

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text, No More Monkey Business

What I Built

I built RepodAI, an AI-powered podcasting platform designed to harness the capabilities of AssemblyAI’s Universal-2 Speech-to-Text Model. RepodAI is more than just a transcription tool—it integrates conversational intelligence, natural language processing, and sentiment analysis to enhance the podcast creation and consumption experience. From transcription to sentiment analysis, speaker identification, and translation, RepodAI empowers podcasters and listeners alike with rich features and seamless usability.

Demo

Screenshots:

Live Demo:

GitHub Repository:

CijeTheCreator / Repod

A powerpacked podcasting platform built around AssemblyAI.

RepodAI: An AI-Powered Podcasting Platform with Transcription, Summarization, and Interactive Features 🎙️

🧰 Getting Started

Make sure Git and NodeJS is installed.
Clone this repository to your local computer.
Create .env file in root directory.
Contents of .env:

# .env

# neon db uri
DATABASE_URL="postgresql://<user>:<password>@<host>:<post>/lingo?sslmode=require"

# openai api key
OPENAI_KEY="sk-###############################################################"

# pinata secrets
PINATA_JWT="###############################################################"
NEXT_PUBLIC_GATEWAY_URL="###############################################################"

📷 Screenshots

AssemblyAI References

async function getTranscript(
  audioUrl: string,
  podcastId: number,
  { basic_details, redaction, speakers }: TOverallForm,
): Promise<any> {
  const redactionKeys = Object.keys(redaction);
  const redactionList = redactionKeys.filter((value, index) => {
    return redaction[value as keyof typeof redaction];
  }) as PiiPolicy[];
  let transcript = await

…

Journey

RepodAI began as a vision for a sophisticated podcasting platform that brings conversational intelligence to the forefront. Leveraging AssemblyAI’s Universal-2 model as the foundation, RepodAI transforms how users interact with audio content. Here’s how I incorporated AssemblyAI’s Speech-to-Text capabilities into this project:

Key Features

Audio Upload and Transcription
Profanity Filtering
Speaker Identification and Sentiment Analysis
Chapter Segmentation and Summarization
Advanced Search and Navigation
AI-Powered Interaction superchared by Lemur
Multi-Language Translation
Dynamic and Interactive Player
Customizable Themes and Mobile Responsiveness

The Prompts I Worked On

Sophisticated Speech-to-Text

I utilized AssemblyAI’s transcription API for two main use cases:

Transcribing the Main Podcast

This step involved converting the uploaded audio file into text, ensuring that the podcast's spoken content was accurately captured and ready for processing by features such as summarization and sentiment analysis.

async function getTranscript(
  audioUrl: string,
  podcastId: number,
  { basic_details, redaction, speakers }: TOverallForm,
): Promise<any> {
  const redactionKeys = Object.keys(redaction);
  const redactionList = redactionKeys.filter((value, index) => {
    return redaction[value as keyof typeof redaction];
  }) as PiiPolicy[];
  let transcript = await client.transcripts.transcribe({
    audio: audioUrl,
    speaker_labels: true,
    auto_chapters: true,
    redact_pii_audio: true,
    filter_profanity: basic_details.filter_profanity,
    redact_pii_policies: redactionList,
    sentiment_analysis: true,
    format_text: true,
    speakers_expected: speakers.speakers.split(",").length,
  });
  await updatePodcastTranscriptionId(podcastId, transcript.id);
  const sentencesResponse = await client.transcripts.sentences(transcript.id);
  const sentences = sentencesResponse.sentences;
  return sentences;
}

Converting Asked Questions to Text

Questions asked to RepodAI's chatbot (via voice input) are transcribed into text before being processed by LeMUR, enabling precise and context-aware responses.

No More Monkey Business

I also employed LeMUR for the following key features:

RepodAI’s Chatbot

The chatbot generates insightful answers to user questions about the podcast by processing transcriptions of both the podcast and the user’s query.
Creating the Initial Podcast Summary

During the upload process, RepodAI uses the transcribed content to generate an initial summary of the podcast, providing a quick overview for users.

Tech Stack 🚀

Next.js 🖥️: For building the UI and backend.
ShadcnUI 🎨: Component library for consistent and elegant UI.
Neon Postgres 🐘: To store user-generated podcasts.
Three.js 🎧: For audio visualization when asking AI questions.
Universal-2 🗣️: Powering sophisticated speech-to-text transcription.
LeMUR 🤖: Intelligent LLM-powered interaction with spoken data.
OpenAI TTS 🗨️: For text-to-speech conversion.

References

The algorithm for RepodAI's audio visulaization(when recording) is from Prakhar625 's audio visualiser codepen in which I altered the source code a little to suit my style of design and way of function for this project.

Top comments (1)

Subscribe

Gigantics • Feb 5

This is a fantastic concept! 🎧 On a related note, if you're interested in technology, data security, and how companies handle sensitive data in development, you might enjoy DataSmart

Read next

Real-Time AI Video Generation Hits 40 FPS: New System Creates High-Quality Clips on a Single GPU

Mike Young - Jan 6

Daily JavaScript Challenge #JS-73: Validate Palindrome Permutation

DPC - Jan 15

Top Generative AI Use Cases Revolutionizing Healthcare in 2025

Hana Sato - Dec 16 '24

Let AI Do Code Review For You

Louis Liu - Jan 2