Introduction
In this tutorial, we’ll build a meeting summarizer backend using FastAPI and AWS Transcribe and Bedrock foundational models. The application transcribes audio recordings, extracts key discussion points, and provides structured summaries with sentiment analysis and issue detection.
Key Features
- Audio Transcription – Uses AWS Transcribe to convert speech to text.
- Speaker Labeling – Identifies different speakers in the conversation.
- Summarization – AWS Bedrock’s Titan model extracts key insights.
- Sentiment Analysis & Issue Detection – Provides a concise summary with tone detection.
- FastAPI Backend – A lightweight, high-performance API for seamless integration.
Tech Stack
- FastAPI – Lightweight web framework for Python
- AWS Transcribe – Speech-to-text conversion
- AWS Bedrock – Fully managed AI service providing LLM integration
- Amazon S3 – Cloud storage for audio files and transcriptions
- Jinja2 – Template engine for prompt formatting
Step 1: Project Setup
1. Install Prerequisites
- Python 3.10+
- Poetry 1.8+ – Dependency management tool
- AWS CLI (Optional, for testing)
2. AWS S3 and Bedrock setup
- Create two s3 buckets and grant necessary permissions.
-
AWS_BUCKET_NAME
- Bucket for holding the Audio files -
OUTPUT_BUCKET_NAME
- Bucket for holding transcriptions
-
- Request model access. In this example I'm using
Titan Text G1 - Lite
.
3. Clone the Repository
git clone https://github.com/bokal2/meeting-summarizer-backend.git
cd meeting-summarizer-backend
4. Install Dependencies
install dependencies:
poetry shell
poetry install
4. Configure AWS Credentials
Create a .env
file with the following:
AWS_REGION=your_aws_region
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_BUCKET_NAME=your_bucket_name
OUTPUT_BUCKET_NAME=your_output_bucket_name
Step 2: API Implementation
Main Components
The backend consists of:
- Audio Upload & Transcription – Sends audio files to AWS S3 and triggers AWS Transcribe.
- Text Processing – Converts transcribed text into a structured format.
- Summarization with AWS Bedrock – Generates meeting summaries based on a prompt template.
FastAPI Implementation (main.py)
import json
import time
import uuid
from fastapi import FastAPI, HTTPException, File, UploadFile
from fastapi.templating import Jinja2Templates
from fastapi.middleware.cors import CORSMiddleware
import boto3
from decouple import config
AWS_REGION = config("AWS_REGION")
AWS_ACCESS_KEY_ID = config("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = config("AWS_SECRET_ACCESS_KEY")
BUCKET_NAME = config("AWS_BUCKET_NAME")
OUTPUT_BUCKET_NAME = config("OUTPUT_BUCKET_NAME")
app = FastAPI()
# Configure allowed origins
origins = [
"http://localhost:3000", # Testing with a React App
]
app.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
templates = Jinja2Templates(directory="templates")
async def upload_file_to_s3(file_obj, file_name, s3_client, bucket_name):
try:
s3_client.upload_fileobj(file_obj, bucket_name, file_name)
except Exception as e:
raise HTTPException(
status_code=400,
detail=f"File uplaod failed: {e}",
)
def process_transcription(transcript_json):
output_text = ""
current_speaker = None
items = transcript_json["results"]["items"]
for item in items:
speaker_label = item.get("speaker_label", None)
content = item["alternatives"][0]["content"]
if speaker_label is not None and speaker_label != current_speaker:
current_speaker = speaker_label
output_text += f"\n{current_speaker}: "
if item["type"] == "punctuation":
output_text = output_text.rstrip()
output_text += f"{content} "
return output_text
async def transcribe_audio(
model_id,
bucket_name,
file_name,
file_content,
output_bucket,
):
# Upload Audio to s3 bucket
s3_client = boto3.client(
"s3",
region_name=AWS_REGION,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
await upload_file_to_s3(
file_obj=file_content,
file_name=file_name,
s3_client=s3_client,
bucket_name=bucket_name,
)
transcribe_client = boto3.client(
"transcribe",
region_name=AWS_REGION,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
job_name = f"transcription-job-{uuid.uuid4()}"
transcribe_client.start_transcription_job(
TranscriptionJobName=job_name,
Media={"MediaFileUri": f"s3://{bucket_name}/{file_name}"},
MediaFormat="mp3",
LanguageCode="en-US",
OutputBucketName=output_bucket,
Settings={"ShowSpeakerLabels": True, "MaxSpeakerLabels": 2},
)
while True:
job_status = transcribe_client.get_transcription_job(
TranscriptionJobName=job_name,
)
status = job_status["TranscriptionJob"]["TranscriptionJobStatus"]
if status in ["COMPLETED", "FAILED"]:
break
time.sleep(2)
if status == "FAILED":
raise HTTPException(status_code=400, detail="Transcription Job failed")
transcript_key = f"{job_name}.json"
transcript_obj = s3_client.get_object(
Bucket=output_bucket,
Key=transcript_key,
)
transcript_text = transcript_obj["Body"].read().decode("utf-8")
transcript_json = json.loads(transcript_text)
output_text = process_transcription(transcript_json)
result = await summarize_transcription(
model_id,
transcript=output_text,
)
return result
Step 3: Summarization Using AWS Bedrock
Prompt Engineering
We use a Jinja2 template to format the transcript for the Bedrock model:
I need to analyze and summarize a conversation. The transcript of the
conversation is between the <data> XML-like tags.
<data>
{{transcript}}
</data>
Please do the following:
1. Identify the main topic being discussed.
2. Provide a concise summary of key points.
3. Include a one-word sentiment analysis.
4. List any issues, problems, or conflicts.
Format the output in JSON:
{
"topic": "<main_topic>",
"meeting_summary": "<summary>",
"sentiment": "<one_word_sentiment>",
"issues": [{"topic": "<issue>", "summary": "<description>"}]
}
AWS Bedrock Summarization
async def summarize_transcription(model_id: str, transcript: str):
bedrock_runtime = boto3.client(
"bedrock-runtime",
region_name=AWS_REGION,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
template = templates.get_template("prompt_template.txt")
rendered_prompt = template.render(transcript=transcript)
try:
kwargs = {
"modelId": model_id,
"contentType": "application/json",
"accept": "*/*",
"body": json.dumps(
{
"inputText": rendered_prompt,
"textGenerationConfig": {
"maxTokenCount": 512,
"temperature": 0,
"topP": 0.9,
},
}
),
}
# Call AWS Bedrock
response = bedrock_runtime.invoke_model(**kwargs)
# Parse response
response_body = json.loads(response.get("body").read())
result = response_body["results"][0]["outputText"]
return {"response": result}
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Error invoking Bedrock: {str(e)}",
)
Summary API Endpoint
@app.post("/summary")
async def audio_summary_test(
model_id: str = "amazon.titan-text-lite-v1",
file: UploadFile = File(...),
):
"""An endpoint for generating meeting summary from audio file"""
# But first, ensure the cursor is at position 0:
file.file.seek(0)
response = await transcribe_audio(
model_id=model_id,
bucket_name=BUCKET_NAME,
file_name=file.filename,
file_content=file.file,
output_bucket=OUTPUT_BUCKET_NAME,
)
return {"response": response}
Step 4: Running the Application
1. Start the FastAPI Server
uvicorn main:app --reload
2. Test the API Using cURL
curl -X POST "http://127.0.0.1:8000/summary" \
-H "Content-Type: multipart/form-data" \
-F "file=@meeting_audio.mp3" \
-F "model_id=amazon.titan-text-lite-v1"
3. Sample JSON Response
{
"response": {
"topic": "Project Updates",
"meeting_summary": "The meeting discussed progress on Q1 deliverables...",
"sentiment": "positive",
"issues": [
{"topic": "Timeline Delay", "summary": "The team noted delays in the design phase."}
]
}
}
4. Test the API Using Next.js Application
I created a simple Next.js app to test the API. You can find the code in this Git repository, along with detailed setup instructions in the README to help you get it up and running quickly.
Some Noticeable Challanges
- Accuracy in Transcription – Issues with accents, low audio volume, overlapping speech, and background noise can lead to poor transcription results.
- LLM Summarization Accuracy – May miss nuances or oversimplify complex discussions.
- Processing Time and Latency – Large audio files lead to long transcription times and LLM response delays.
- Scalability Issues – Handling multiple users and large audio files can lead constraints on the underlying resources.
- Prompt Engineering Complexity – Designing effective prompts for sequential or chat-based interactions is challenging, with limited reference resources currently available.
Conclusion
Exploring AWS Bedrock and experimenting with different foundation models was an exciting experience. It’s impressive how seamlessly developers can leverage these models to build LLM-powered applications with minimal hassle. The potential is immense, and I look forward to diving deeper, exploring advanced models, and uncovering new possibilities.
Next Steps:
- Deploy the API using AWS Lambda or EKS
- Enhance prompt engineering for better summarization accuracy
Top comments (0)