DEV Community

Warner Bell
Warner Bell

Posted on

🎧 AWS Audio Transcription Automation with CloudFormation

Welcome to the AWS Audio Transcription Automation project! This CloudFormation stack automates transcription of audio files (MP4, MP3, and WAV) using Amazon Transcribe. Easily upload your audio files to S3, trigger transcription jobs, and store results in an output S3 bucket — all automated! 🎉

Architecture Diagram

Transcription Diagram drawio (1)

GitHub repo: https://github.com/Warner-Bell/audio-transcription-build

🚀 Features

  • Automatic Transcription: Supports MP4, MP3, and WAV audio files, using Amazon Transcribe.
  • Secure Storage: AES-256 encryption for S3 buckets.
  • Lifecycle Management: Automatically expire input files after 1 day and log files after 2 days.
  • Logging: Logs Lambda function execution and S3 access.
  • Event-Driven Transcription: Automatically triggers transcription jobs upon file upload.
  • Final HTML Output: Transcription results are processed into an HTML file, viewable on any browser for easy reading.

🛠️ Technology Stack

  • Amazon S3: Storage for audio files, transcription results, and access logs.
  • AWS Lambda: Event-driven function to trigger transcription jobs.
  • Amazon Transcribe: Speech-to-text transcription service.
  • Amazon CloudWatch: Logs Lambda function activity.
  • IAM Roles: Manages permissions for Lambda and S3.

🎯 How It Works

  1. Upload audio files (MP4, MP3, or WAV) to the designated S3 input bucket.
  2. Lambda Triggered: Upon upload, a Lambda function triggers an Amazon Transcribe job.
  3. Transcription Results: Transcription results are stored in the specified S3 output bucket, and formatted as an HTML file for easy viewing.

🧩 CloudFormation Resources

This CloudFormation stack provisions the following resources:

  • Input S3 Bucket: For audio files awaiting transcription.
  • Output S3 Bucket: Stores completed transcription results.
  • Logging S3 Bucket: Logs access events for both input and output buckets.
  • AWS Lambda Function: Automatically triggers transcription jobs.
  • IAM Role: Provides necessary permissions for the Lambda function.
  • CloudWatch Log Group: Logs Lambda function execution.

📦 Installation & Setup

Prerequisites

  1. AWS CLI: Installed and configured on your local machine.
  2. AWS Account: Ensure permissions to create CloudFormation stacks, Lambda functions, and S3 buckets.
  3. Edit Deploy File: Edit the STACK_NAME constant to a unique name in the deploy-transcription.sh file. Update the region in the s3-trigger.sh file if your region is not us-east-1.

Deploy the Stack

  1. Clone the repository:
   git clone https://github.com/Warner-Bell/audio-transcription-build.git
   cd audio-transcription-automation
Enter fullscreen mode Exit fullscreen mode

Update the s3-trigger.sh file with you stack name

  1. Run the deployment script:
   ./bin/deploy-transcription.sh
Enter fullscreen mode Exit fullscreen mode

This script deploys the CloudFormation stack using a template file, setting up the required S3 buckets, Lambda function, and permissions.

  1. Deploy S3 Trigger for Lambda: After deploying the CloudFormation stack, set up the S3 bucket trigger using s3-trigger.sh:
   ./bin/s3-trigger.sh
Enter fullscreen mode Exit fullscreen mode

Note: The S3 trigger configuration script (s3-trigger.sh) is created as a separate script to avoid circular dependencies between the S3 bucket, Lambda function, and logging configuration.

  1. Verify Deployment: Monitor the deployment progress in the AWS CloudFormation Console to ensure all resources are created successfully.

  2. Start Transcribing: Once deployed, simply upload audio files to the designated input S3 bucket to start transcription. After running the S3 bucket trigger, upload your files to S3 using upload-to-s3.sh (Edit the file variables with your info.)

   ./bin/upload-to-s3.sh
Enter fullscreen mode Exit fullscreen mode

Note on Output: Transcription output files are stored in both JSON and HTML format in the output S3 bucket. The HTML format can be viewed directly in any web browser for easy reading, eliminating the need to manually copy text from JSON files.

📂 Project Structure

.
├── audio-samples        # 3 sample audio files in mp3, mp4, and wav format
├── bin/
│   ├── deploy-transcription.sh # Deployment script
│   ├── empty-buckets.sh        # Empty all Project Buckets
│   ├── s3-trigger.sh           # Lambda role configuration and S3 bucket notifications
│   ├── upload-to-s3.sh         # Upload selected audio file to s3
├── cfn/audio-transcription.yaml         # CloudFormation template
├── README.md                   # Project documentation
Enter fullscreen mode Exit fullscreen mode

📊 Monitoring & Logging

  • CloudWatch Logs: View Lambda execution logs in the AWS CloudWatch console.
  • S3 Access Logs: Access logs for input and output S3 buckets are stored in the logging bucket.

⚙️ Recommended Practices

  • Adjust Bucket Retention: Modify lifecycle policies to suit your data retention requirements.
  • Configure Notifications: Install and configure notify-send on Linux for deployment notifications (optional).
  • Review IAM Policies: Ensure permissions are as restrictive as possible for production use.

🤝 Contributing

We welcome contributions! To contribute, follow these steps:

How to Contribute

  1. Fork the repo 🍴
  2. Create a new branch:
   git checkout -b feature/awesome-feature
Enter fullscreen mode Exit fullscreen mode
  1. Commit your changes 💻
  2. Push your branch and submit a PR 🛠️

🌍 License

This project is licensed under the MIT License. See the LICENSE file for details.

🏆 Acknowledgements

  • Thanks to AWS for their robust services. 💪
  • Special thanks to OpenAI for inspiring innovation with AI-based tools. 🙌

📬 Contact

Warner Bell - Tap In!


Happy Transcribing!

Top comments (0)