Have you started your AI journey and want to implement a project that will help you build hands on experience with Gen AI using Amazon Bedrock? Well worry no more because in this project, I will guide you through an end to end project to build a sophisticated chatbot. It will enable users to interact with PDF documents smoothly. The app is conceived such that, a user can ask a question and the bot responds with an answer from the uploaded file. If it doesn’t find a response, it will report it. The user can upload files of maximum 200MB and the app will manage without difficulties thanks to the power of Bedrock LLM
I will guide you through a set of tools and technologies to create this application so as to guarantee reliable performance and an easy to use user interface, these tools include:
- Amazon bedrock
- AWS S3
- AWS EC2
- Docker
- Langchain
- Streamlit
Architecture
Principle
The app is conceived such that, when a user visits the web page, the first thing he is asked to do is to upload a pdf file. The file is processed using PyPDF and divides it into chunks. The chunks are then converted into vectors which is a representation of the PDF’s content. The generated vectors are stored in an S3 bucket for access and retrieval when the user asks a question.
When there is a query from the user, the application processes the vector from the S3 to seek similarities, it then generates a prompt with a query and context which are then used as input for our LLM(Jurassic-2 Mid) which then generates the answer for the user. The application runs in a Docker container, using Streamlit to create a visually appealing UI.
How to build It
Launch an EC2 instance
Login to AWS console and Launch a t2.micro instance with the following configuration
Name: pdf-Chat-Bot
Instance type: t2.micro
AMI: Ubuntu:latest
Volume: 8GiB
Security gate: Create new
- inbound rules
=> allow 8083 from everywhere
=> allow ssh from my IP
launch template:
#!bin/bash
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
`
Create an IAM Role for the EC2 instance to access Bedrock and S3
Go to AWS console -> IAM -> Roles -> Create role
name: pdfBotRole
attach policies:
- AmazonBedrockFullAccess
- AmazonS3FullAccess
Attach role to EC2 instance
Go to the EC2 console, select the instance then go to Actions -> Security -> Modify IAM role
Select the IAM role previously created "pdfBotRole" and click Apply
Create S3 bucket
On the console, search S3 then create a bucket with the following(xxx are random numbers to make the bucket name unique)
name: bedrock-chatpdf-xxx
region: us-east-1
allow defaults and choose create
Copy the bucket name as we'll use it in the next steps
SSH into the instance and clone Source code
Copy the public address of the instance and open a terminal
ssh -i "path/to/.pem-file"
ubuntu@public-ip-address
Verify docker installation
docker ps
If docker is installed, will see a table for existing docker containers which will of course be empty.
Clone source
Open a terminal and run the following
git clone https://github.com/Ndzenyuy/chatPdf.git
cd chatPdf
In the cloned source code, we have the following files/folders
Dockerfile
application.py
requirements.txt
/images
Access for LLM models in Amazon Bedrock
On the console Amazon Bedrock -> Base models -> Model Access
Make sure you have access to Jurassic-2 Ultra
and
Titan Embeddings G1 - Text
, if not you can request access.
Build and Run App docker image
Make sure you are inside chatPdf folder and run the following command
docker build -t chatPdf-app .
The image will be built, then we can run it with the following
docker run -d -e BUCKET_NAME="yourBucketName" -p 8083:8083 chatPdf
Now copy the public IP of the EC2 instance and type it on the browser followed by the port number 8083. For instance
XX.XX.XX.XX:8083
How to use the App
The landing page will first require the user to upload a pdf document
Either drag and drop or Click on the button "Browse files" Load the PDF document and ask questions based on its content
Conclusion
This project happens to be an innovative PDF chatbot application that will reduce significantly the time researchers spend on reading PDF of articles and books. It transforms hours of traditional page by page reading and trying to understand irrelevant information as to the current needs, into just few prompts and interactive engagements, users can efficiently understand the content, authorship, summaries and in depth knowledge of pdf documents
This app will serve as a valuable tool for students harnessing their ability of interacting with academic articles and literature. By leveraging the procedure of further breaking complex texts, it does not only save time but builds a sense of critical thinking and asking of relevant questions
Top comments (0)