DEV Community

Erick Okal
Erick Okal

Posted on

Building a Meeting Summarizer Backend with Python FastAPI, AWS Transcribe and AWS Bedrock

Introduction

In this tutorial, we’ll build a meeting summarizer backend using FastAPI and AWS Transcribe and Bedrock foundational models. The application transcribes audio recordings, extracts key discussion points, and provides structured summaries with sentiment analysis and issue detection.

Key Features

  • Audio Transcription – Uses AWS Transcribe to convert speech to text.
  • Speaker Labeling – Identifies different speakers in the conversation.
  • Summarization – AWS Bedrock’s Titan model extracts key insights.
  • Sentiment Analysis & Issue Detection – Provides a concise summary with tone detection.
  • FastAPI Backend – A lightweight, high-performance API for seamless integration.

Tech Stack

  • FastAPI – Lightweight web framework for Python
  • AWS Transcribe – Speech-to-text conversion
  • AWS Bedrock – Fully managed AI service providing LLM integration
  • Amazon S3 – Cloud storage for audio files and transcriptions
  • Jinja2 – Template engine for prompt formatting

Step 1: Project Setup

1. Install Prerequisites

  • Python 3.10+
  • Poetry 1.8+ – Dependency management tool
  • AWS CLI (Optional, for testing)

2. AWS S3 and Bedrock setup

  • Create two s3 buckets and grant necessary permissions.
    • AWS_BUCKET_NAME - Bucket for holding the Audio files
    • OUTPUT_BUCKET_NAME - Bucket for holding transcriptions
  • Request model access. In this example I'm using Titan Text G1 - Lite.

Model-access-request

3. Clone the Repository

git clone https://github.com/bokal2/meeting-summarizer-backend.git
cd meeting-summarizer-backend
Enter fullscreen mode Exit fullscreen mode

4. Install Dependencies

install dependencies:

poetry shell
poetry install
Enter fullscreen mode Exit fullscreen mode

4. Configure AWS Credentials

Create a .env file with the following:

AWS_REGION=your_aws_region
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_BUCKET_NAME=your_bucket_name
OUTPUT_BUCKET_NAME=your_output_bucket_name
Enter fullscreen mode Exit fullscreen mode

Step 2: API Implementation

Main Components

The backend consists of:

  • Audio Upload & Transcription – Sends audio files to AWS S3 and triggers AWS Transcribe.
  • Text Processing – Converts transcribed text into a structured format.
  • Summarization with AWS Bedrock – Generates meeting summaries based on a prompt template.

FastAPI Implementation (main.py)

import json
import time
import uuid
from fastapi import FastAPI, HTTPException, File, UploadFile
from fastapi.templating import Jinja2Templates
from fastapi.middleware.cors import CORSMiddleware
import boto3
from decouple import config

AWS_REGION = config("AWS_REGION")
AWS_ACCESS_KEY_ID = config("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = config("AWS_SECRET_ACCESS_KEY")
BUCKET_NAME = config("AWS_BUCKET_NAME")
OUTPUT_BUCKET_NAME = config("OUTPUT_BUCKET_NAME")

app = FastAPI()

# Configure allowed origins
origins = [
    "http://localhost:3000", # Testing with a React App
]

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

templates = Jinja2Templates(directory="templates")

async def upload_file_to_s3(file_obj, file_name, s3_client, bucket_name):
    try:
        s3_client.upload_fileobj(file_obj, bucket_name, file_name)
    except Exception as e:
        raise HTTPException(
            status_code=400,
            detail=f"File uplaod failed: {e}",
        )

def process_transcription(transcript_json):
    output_text = ""
    current_speaker = None

    items = transcript_json["results"]["items"]

    for item in items:

        speaker_label = item.get("speaker_label", None)
        content = item["alternatives"][0]["content"]

        if speaker_label is not None and speaker_label != current_speaker:
            current_speaker = speaker_label
            output_text += f"\n{current_speaker}: "

        if item["type"] == "punctuation":
            output_text = output_text.rstrip()

        output_text += f"{content} "

    return output_text

async def transcribe_audio(
    model_id,
    bucket_name,
    file_name,
    file_content,
    output_bucket,
):

    # Upload Audio to s3 bucket
    s3_client = boto3.client(
        "s3",
        region_name=AWS_REGION,
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    )
    await upload_file_to_s3(
        file_obj=file_content,
        file_name=file_name,
        s3_client=s3_client,
        bucket_name=bucket_name,
    )

    transcribe_client = boto3.client(
        "transcribe",
        region_name=AWS_REGION,
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    )

    job_name = f"transcription-job-{uuid.uuid4()}"

    transcribe_client.start_transcription_job(
        TranscriptionJobName=job_name,
        Media={"MediaFileUri": f"s3://{bucket_name}/{file_name}"},
        MediaFormat="mp3",
        LanguageCode="en-US",
        OutputBucketName=output_bucket,
        Settings={"ShowSpeakerLabels": True, "MaxSpeakerLabels": 2},
    )

    while True:
        job_status = transcribe_client.get_transcription_job(
            TranscriptionJobName=job_name,
        )

        status = job_status["TranscriptionJob"]["TranscriptionJobStatus"]
        if status in ["COMPLETED", "FAILED"]:
            break
        time.sleep(2)

    if status == "FAILED":
        raise HTTPException(status_code=400, detail="Transcription Job failed")

    transcript_key = f"{job_name}.json"
    transcript_obj = s3_client.get_object(
        Bucket=output_bucket,
        Key=transcript_key,
    )
    transcript_text = transcript_obj["Body"].read().decode("utf-8")
    transcript_json = json.loads(transcript_text)

    output_text = process_transcription(transcript_json)

    result = await summarize_transcription(
        model_id,
        transcript=output_text,
    )

    return result

Enter fullscreen mode Exit fullscreen mode

Step 3: Summarization Using AWS Bedrock

Prompt Engineering

We use a Jinja2 template to format the transcript for the Bedrock model:

I need to analyze and summarize a conversation. The transcript of the
conversation is between the <data> XML-like tags.

<data>
{{transcript}}
</data>

Please do the following:
1. Identify the main topic being discussed.
2. Provide a concise summary of key points.
3. Include a one-word sentiment analysis.
4. List any issues, problems, or conflicts.

Format the output in JSON:
{
    "topic": "<main_topic>",
    "meeting_summary": "<summary>",
    "sentiment": "<one_word_sentiment>",
    "issues": [{"topic": "<issue>", "summary": "<description>"}]
}
Enter fullscreen mode Exit fullscreen mode

AWS Bedrock Summarization

async def summarize_transcription(model_id: str, transcript: str):

    bedrock_runtime = boto3.client(
        "bedrock-runtime",
        region_name=AWS_REGION,
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    )

    template = templates.get_template("prompt_template.txt")

    rendered_prompt = template.render(transcript=transcript)

    try:
        kwargs = {
            "modelId": model_id,
            "contentType": "application/json",
            "accept": "*/*",
            "body": json.dumps(
                {
                    "inputText": rendered_prompt,
                    "textGenerationConfig": {
                        "maxTokenCount": 512,
                        "temperature": 0,
                        "topP": 0.9,
                    },
                }
            ),
        }
        # Call AWS Bedrock
        response = bedrock_runtime.invoke_model(**kwargs)
        # Parse response
        response_body = json.loads(response.get("body").read())
        result = response_body["results"][0]["outputText"]
        return {"response": result}
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Error invoking Bedrock: {str(e)}",
        )

Enter fullscreen mode Exit fullscreen mode

Summary API Endpoint

@app.post("/summary")
async def audio_summary_test(
    model_id: str = "amazon.titan-text-lite-v1",
    file: UploadFile = File(...),
):
    """An endpoint for generating meeting summary from audio file"""
    # But first, ensure the cursor is at position 0:
    file.file.seek(0)

    response = await transcribe_audio(
        model_id=model_id,
        bucket_name=BUCKET_NAME,
        file_name=file.filename,
        file_content=file.file,
        output_bucket=OUTPUT_BUCKET_NAME,
    )

    return {"response": response}

Enter fullscreen mode Exit fullscreen mode

Step 4: Running the Application

1. Start the FastAPI Server

uvicorn main:app --reload
Enter fullscreen mode Exit fullscreen mode

2. Test the API Using cURL

curl -X POST "http://127.0.0.1:8000/summary" \
-H "Content-Type: multipart/form-data" \
-F "file=@meeting_audio.mp3" \
-F "model_id=amazon.titan-text-lite-v1"
Enter fullscreen mode Exit fullscreen mode

3. Sample JSON Response

{
    "response": {
        "topic": "Project Updates",
        "meeting_summary": "The meeting discussed progress on Q1 deliverables...",
        "sentiment": "positive",
        "issues": [
            {"topic": "Timeline Delay", "summary": "The team noted delays in the design phase."}
        ]
    }
}
Enter fullscreen mode Exit fullscreen mode

4. Test the API Using Next.js Application

I created a simple Next.js app to test the API. You can find the code in this Git repository, along with detailed setup instructions in the README to help you get it up and running quickly.

Image description

Some Noticeable Challanges

  • Accuracy in Transcription – Issues with accents, low audio volume, overlapping speech, and background noise can lead to poor transcription results.
  • LLM Summarization Accuracy – May miss nuances or oversimplify complex discussions.
  • Processing Time and Latency – Large audio files lead to long transcription times and LLM response delays.
  • Scalability Issues – Handling multiple users and large audio files can lead constraints on the underlying resources.
  • Prompt Engineering Complexity – Designing effective prompts for sequential or chat-based interactions is challenging, with limited reference resources currently available.

Conclusion

Exploring AWS Bedrock and experimenting with different foundation models was an exciting experience. It’s impressive how seamlessly developers can leverage these models to build LLM-powered applications with minimal hassle. The potential is immense, and I look forward to diving deeper, exploring advanced models, and uncovering new possibilities.

Next Steps:

  • Deploy the API using AWS Lambda or EKS
  • Enhance prompt engineering for better summarization accuracy

Top comments (0)