DEV Community

Stan Ke
Stan Ke

Posted on

subtitleGenAI subtitle generation platform

This is a submission for the AssemblyAI Challenge: Sophisticated Speech-to-Text.


What I Built

I developed subtitleGenAI, a python APP that leverages AssemblyAI’s Speech-to-Text (STT) capabilities to generate precise subtitles for video and audio files. This app allows users to upload files in various formats (e.g., MP4, MKV, MOV) and outputs subtitles in both SRT with timelines and plain text formats for user convenience. Additionally, it supports real-time transcription and translation, making it a versatile tool for accessibility and multilingual content creation.

Key features include:

  • Real-Time Subtitling: Automatically generates subtitles for live audio streams.
  • Multi-Format Support: Accepts video and audio file uploads in common formats (MP4, MKV, MOV, etc.).
  • Customizable Outputs: Provides SRT files with timestamps and plain text transcriptions.
  • Integrated Translation: Offers translation of transcriptions into multiple languages using third-party APIs.
  • Accessibility Compliance: Designed with accessibility standards in mind to cater to educational institutions and organizations.

Demo

You can try out the app here: Live Demo

image
Image description


Journey

Integration with AssemblyAI’s Universal-2 Model

To create this application, I used AssemblyAI’s Universal-2 Speech-to-Text Model for its high accuracy and robust features. Here’s how it was incorporated:

  1. File Upload and Processing:

    • Used streamlit to build the user interface for uploading video/audio files.
    • Implemented Python’s file-handling capabilities to process uploads in memory for compatibility with AssemblyAI’s API.
  2. Real-Time Transcription:

    • Integrated AssemblyAI's Streaming API for live transcription, enabling real-time subtitling of audio streams.
    • Leveraged the API’s support for speaker diarization, ensuring each speaker is accurately identified.
  3. Translation Layer:

    • Incorporated an additional translation API to convert transcriptions into multiple languages, broadening the app's utility for global users.
  4. Subtitles Export:

    • Used the MoviePy library to embed subtitles directly into videos for end-to-end subtitle creation.
    • Provided export options for SRT and plain text formats.
  5. Error Handling and Performance:

    • Built robust error handling to manage API throttling and upload limitations.
    • Optimized file processing using efficient memory management techniques.

Qualifying for Additional Prompts

  • Accessibility Standards: SmartSubtitles ensures compliance with accessibility regulations by providing accurate captions for video content, aiding the hearing-impaired community.
  • Multilingual Support: The app bridges language barriers by seamlessly translating subtitles into multiple languages.

Effective Use Cases

1.Hearing-Impaired Viewers

Scenario: A hearing-impaired individual wants to watch an educational video or movie that lacks subtitles.
Solution: The user uploads the video to the app, which generates accurate subtitles. The subtitles can be downloaded as an SRT file or embedded directly into the video.
Outcome: The individual can enjoy the video content with full understanding, improving inclusivity and accessibility.

2.Language Learners

Scenario: A student learning English watches a lecture or a movie in English but struggles to understand spoken words.
Solution: The app transcribes the audio and generates subtitles in both English and the student’s native language using translation features.
Outcome: The learner can follow the video while improving their language skills.

  1. Content Creators:

    • Scenario: YouTubers and podcasters automatically add subtitles to their content, increasing viewer engagement and reaching non-native speakers.
    • Outcome: Broader audience reach and improved SEO for their videos.
  2. Journalism:

    • Scenario: News agencies generate real-time transcriptions for live interviews and events, with the option to translate into different languages.
    • Outcome: Faster reporting and accessibility for international audiences.
  3. Government Agencies:

    • Scenario: Public service announcements are subtitled in multiple languages for diverse communities.
    • Outcome: Ensured critical information is accessible to everyone.

Cover Image

Image description


Thanks for the opportunity to participate in this exciting challenge!


Enter fullscreen mode Exit fullscreen mode

Top comments (0)