Table of Contents
- Introduction
- Understanding the Problem
- Setting Up the Development Environment
- Implementing Chatbot Functionalities
- Testing and Debugging
- Problems faced and conclusions
1. Introduction
1.1 What is a Chatbot?
A chatbot is a type of software that mimics conversations with people. Most chatbots communicate through text, but some can also use voice. They use artificial intelligence (AI) to understand what users are asking and provide answers quickly. This makes chatbots useful for handling routine tasks and giving information efficiently.
The main job of a chatbot is to talk with users. It does this through a messaging platform, which can be as simple as answering straightforward questions or managing more complex conversations. By using natural language processing (NLP), chatbots can understand user questions and provide relevant responses, making interactions smoother and more effective.
1.2 Why build a chatbot?
Building an information chatbot helps people quickly find answers and details they need without waiting or searching for them manually. For example, if you’re looking for information on scholarships, a chatbot can instantly provide the details you need, saving you time and effort. It can handle many questions at once, is available 24/7, and can make finding the right information much easier for everyone.
2. Understanding the Problem
2.1 What Problems Does the Chatbot Solve?
A chatbot helps solve the problem of finding information by making it easier to get answers quickly. Instead of spending a lot of time searching online or waiting for help, users can ask the chatbot their questions and get instant responses. This means users don’t have to search through multiple websites or wait for office hours; the information is available anytime, making it more accessible and convenient for everyone.
2.2 What Should the Chatbot Do?
- Display text queries on the UI
- Display image queries on the UI
- Provide text-based responses from text queries
- Handle image-based queries
- Preview image before and after sending to the UI
3. Setting Up the Development Environment
3.1 Tools and Technologies
- HTML & CSS: Basic web design
- JavaScript: Adding interactivity
- API: Fetching information
- Node.js & Express: Server-side handling
3.2 Prerequisites
- Basic understanding of HTML, CSS, and JavaScript
- A code editor (e.g., Visual Studio Code)
- Web browser (for testing)
3.3 Setting up the environment
- Install Node.js and npm: Make sure you have Node.js installed on your system. If not, download and install it from Node.js official site. Verify the installation:
node -v
npm -v
- Create a folder and open it on your code editor
- Initialize a Node.js project This creates a package.json file for managing dependencies.
npm init -y
- Install Required Dependencies: You will need express, axios, dotenv, @google-ai/generativelanguage, @google/generative-ai/server, multer, body-parser, and cors for this setup:
npm install express axios dotenv cors @google-ai/generativelanguage @google/generative-ai/server multer body-parser
express: This tool helps build a web server that listens for and responds to requests. For example, it manages everything from showing web pages to accepting images or text from users.
axios: This tool is used to make requests to other servers or APIs (like calling another website to get data). It sends and receives data over the internet, making it easy to connect your app to external services.
dotenv: This tool is used to store important secrets (like API keys or passwords) in a hidden file called .env. It helps keep sensitive information safe, so you don't accidentally share it with others.
@google-ai/generativelanguage: This package is used to connect with Google’s AI language services. It helps send user inputs (like text) to Google's AI and get back smart, AI-generated responses for your app to use.
@google/generative-ai/server: This tool works with Google's AI to handle files like images. It helps upload images to Google's AI for processing and analysis, and then receive useful insights or responses from the AI.
multer: This tool is used to handle file uploads, like when users send images or other files to your server. It saves these files in a specific folder so your server can use them.
body-parser: This tool allows the server to easily understand data (like text or form data) sent from the user’s browser. It helps grab that data and make it usable in the code.
cors: This tool allows your server to accept requests from different websites or apps. Normally, browsers block certain requests for safety, but cors enables you to safely handle requests from other sites.
- Creating the API Key
- What is an API Key? An API key is like a special password that lets programs talk to each other. It keeps things secure by making sure only allowed users can access a service.
Why use an API key?
An API key is like a password for using a service or app. It keeps things secure by making sure only the right people can access certain features or data. This helps prevent misuse and keeps your information safe. It also helps the service provider see how much the service is being used, so they can manage it better.Gemini API Key
The Gemini API key is crucial for my chatbot project as it allows the bot to access advanced AI features. This key enables the chatbot to understand and generate responses based on user inputs and uploaded images. By using this API, I can enhance the chatbot's intelligence and provide a better experience for users seeking assistance.How to create a Gemini API Key
Go to Gemini AI Studio. If you don’t have an account, sign up for one.
Scroll down to where you can see what is on the image below
Click on Get your API key
. It will lead you to this page
Once on this page, click the blue button with the label Create API key
. This will lead you to another page. On this page, you will either create an API key for a new project or an existing one.
I clicked on creating for a new project, since I am working on a new project.
Once your AP key is created, you can now copy it and use in your project.
Remember the tip here, to use your API key securely.
You should keep your API key secret because it acts like a password for your application. If someone finds it, they could misuse it to access your data or services, leading to security issues or extra costs. Keeping it private helps protect your project and ensures it runs smoothly.
- Creating the Server and Hiding the API Key Create a server.js file: This will contain your backend code to handle incoming requests from the chatbot, make calls to the Gemini API, and respond with the generated messages. Create the .env File: In the root of your project, create an .env file to store the Gemini API key. This will be the final project structure
4. Building the Chatbot
4.1 Designing the Chatbot Interface
- Creating the HTML structure
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="styles/index.css">
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-QWTKZyjpPEjISv5WaRU9OFeRpok6YctnYmDr5pNlyT2bRjXh0JMhjY6hW+ALEwIH" crossorigin="anonymous">
<title>ImageBot</title>
</head>
<body>
<div class="container chatBot">
<h1 class="header">ChatBot</h1>
<div class="chatHistory" id="chatHistory">
<!-- Messages will be appended here -->
</div>
<div class="inputSession" id="inputSession">
<!-- Image preview before sending -->
<div class="imagePreview">
<img id="previewImage">
<input id="textInput" type="text" placeholder="Write your message here...">
</div>
<label for="imageInput"><img src="/assets/Upload.png" width="25px"></label>
<input id="imageInput" accept="image/*" type="file" name="image" hidden>
<button id="btnSend"><img src="/assets/send.png" width="25px"></button>
</div>
</div>
<!-- Modal Structure -->
<div id="imageModal" class="modal">
<span class="close">×</span>
<img class="modal-content" id="modalImage">
</div>
<script src="/scripts/index.js"></script>
</body>
</html>
This HTML page is designed to be a user-friendly chatbot interface where users can send both text and images. The structure starts with a <div>
container, which holds all the chatbot content. The header displays the title "ChatBot" at the top. Below that, there is a chatHistory
section where all previous conversations (messages sent and received) are displayed.
For interacting with the bot, there's an inputSession
section. It contains a field where users can type their message and an option to preview any image they select before sending it. The image upload button is represented by an icon, and users can select an image from their device. After the message or image is ready, they can send it by clicking the "Send" button.
Additionally, a modal (pop-up window) is included, which allows users to view images in a larger format. The structure also links external CSS styles and Bootstrap for easier formatting and design, making the page responsive and visually appealing.
- Styling with CSS
body {
background-color: #f5f5f5;
font-family: 'Arial', sans-serif;
margin: 0;
padding: 0;
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
}
/* ChatBot container */
.container.chatBot {
background-color: #ffffff;
width: 50%;
max-width: 600px;
border-radius: 8px;
box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);
display: flex;
flex-direction: column;
justify-content: space-between;
padding: 20px;
position: relative;
}
/* Header styling */
.header {
font-size: 24px;
color: #333;
text-align: center;
margin-bottom: 15px;
}
/* Chat history styling */
.chatHistory {
height: 300px;
overflow-y: auto;
padding: 10px;
background-color: #f1f1f1;
border-radius: 8px;
border: 1px solid #ddd;
margin-bottom: 20px;
}
.chatHistory::-webkit-scrollbar {
width: 8px;
}
.chatHistory::-webkit-scrollbar-thumb {
background-color: #ccc;
border-radius: 4px;
}
/* Input session styling */
.inputSession {
display: flex;
align-items: center;
padding: 10px;
background-color: #f1f1f1;
border-radius: 8px;
border: 1px solid #ddd;
justify-content: space-between;
}
/* Input field styling */
#textInput {
width: 80%;
padding: 10px;
border: none;
border-radius: 8px;
background-color: #fff;
box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);
margin-right: 10px;
font-size: 16px;
flex-grow: 1;
border: 1px solid #ddd;
border-radius: 4px;
padding: 8px;
margin-right: 10px;
width: 100%;
}
/* Button for sending messages */
#btnSend {
color: #fff;
border: none;
border-radius: 50%;
width: 50px;
height: 50px;
display: flex;
align-items: center;
justify-content: center;
cursor: pointer;
font-size: 20px;
transition: background-color 0.3s;
}
#btnSend:hover {
background-color: #363e47;
}
/* Image preview styling */
.imagePreview {
display: flex;
align-items: center;
flex-grow: 1;
margin-bottom: 10px;
}
#previewImage {
max-width: 80px;
max-height: 80px;
border-radius: 5px;
margin-right: 10px;
object-fit: cover;
margin-right: 10px;
}
/* Label for file input */
label[for="imageInput"] {
color: #fff;
border-radius: 50%;
width: 50px;
height: 50px;
display: flex;
align-items: center;
justify-content: center;
font-size: 20px;
cursor: pointer;
margin-right: 10px;
}
label[for="imageInput"]:hover {
background-color: #363e47;
}
/* Styling for user messages */
.userMessage {
display: flex;
align-items: flex-start;
margin-bottom: 10px;
padding: 10px;
background-color: #e9ecef;
border-radius: 8px;
border: 1px solid #ddd;
max-width: 100%;
}
/* Container for image and text */
.messageContent {
display: flex;
flex-direction: column;
align-items: flex-start;
}
/* Styling for images within user messages */
.userMessage img {
max-width: 100px;
max-height: 100px;
border-radius: 5px;
margin-bottom: 5px;
object-fit: cover;
}
/* Styling for text within user messages */
.userMessage .text {
text-align: left;
}
/* Modal styling */
.modal {
display: none;
position: fixed;
z-index: 1000;
left: 0;
top: 0;
width: 100%;
height: 100%;
overflow: auto;
background-color: rgb(0,0,0);
background-color: rgba(0,0,0,0.8);
}
.modal-content {
margin: auto;
display: block;
width: 80%;
max-width: 700px;
}
.close {
position: absolute;
top: 15px;
right: 35px;
color: #f1f1f1;
font-size: 40px;
font-weight: bold;
}
.close:hover,
.close:focus {
color: #bbb;
text-decoration: none;
cursor: pointer;
}
This CSS provides a clean, responsive design for a chatbot interface. It uses flexbox for layout, giving the container, header, and input session flexibility. The styling ensures a user-friendly experience with smooth transitions, scrollable chat history, and a modern look with rounded corners and shadows, enhancing visual appeal and usability.
4.2 Implementing Chatbot Functionalities
4.1 Frontend
<div class="inputSession" id="inputSession">
<div class="imagePreview">
<img id="previewImage">
<input id="textInput" type="text" placeholder="Write your message here...">
</div>
<label for="imageInput"><img src="/assets/Upload.png" width="25px"></label>
<input id="imageInput" accept="image/*" type="file" name="image" hidden>
<button id="btnSend"><img src="/assets/send.png" width="35px"></button>
</div>
<div class="chatHistory" id="chatHistory">
<!-- Messages will be appended here -->
</div>
Here the .inputSession
div includes an input field for text (#textInput), an image preview (#previewImage), a hidden file input for images (#imageInput), and a button to send messages (#btnSend). The .chatHistory
div is where chat messages and responses will appear. The image input allows users to select an image file, which is previewed before sending. The text input field lets users type messages, and the send button triggers the process to handle both text and image inputs.
const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");
// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");
// Function to preview the image when selected
imageInput.addEventListener("change", function () {
const file = imageInput.files[0];
if (file) {
const reader = new FileReader();
reader.onload = function (e) {
previewImage.src = e.target.result; // Preview the image
};
reader.readAsDataURL(file); // Read the file as a data URL
}
});
// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
e.preventDefault();
const text = textInput.value.trim();
const file = imageInput.files[0];
// Clear inputs
textInput.value = "";
previewImage.src = "";
imageInput.value = null;
// Append the image and message to the chat immediately on the UI
if (file) {
addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
} else if (text) {
addMessageToChatHistory(null, text, "userMessage");
}
// Send the image and text to the backend
if (file || text) {
const formData = new FormData();
if (file) formData.append("image", file);
if (text) formData.append("message", text);
try {
const response = await fetch('http://localhost:3001/api/upload', {
method: 'POST',
body: formData
});
if (!response.ok) {
throw new Error('Failed to send image to the server.');
}
const data = await response.json();
// Display the bot's message (response) based on the image
addMessageToChatHistory(null, data.reply, "botMessage");
} catch (error) {
console.error('Error sending image or text:', error);
addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
}
}
});
function addMessageToChatHistory(imageSrc, text, className) {
const messageContainer = document.createElement("div");
messageContainer.classList.add(className);
if (imageSrc) {
const imageElement = document.createElement("img");
imageElement.src = imageSrc;
imageElement.classList.add("previewed-image");
messageContainer.appendChild(imageElement);
}
if (text) {
const textContainer = document.createElement("div");
textContainer.classList.add("text");
textContainer.textContent = text;
messageContainer.appendChild(textContainer);
}
chatHistory.appendChild(messageContainer);
chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}
closeModal.addEventListener("click", function (e) {
e.preventDefault();
modal.style.display = "none";
});
window.addEventListener("click", function (e) {
if (e.target === modal) {
modal.style.display = "none";
}
});
textInput.addEventListener("keydown", (e) => {
if (e.key === "Enter") {
btnSend.click();
}
});
- Element Selection
const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");
This section selects key HTML elements that will be used throughout the script. btnSend
is the button used to send messages, imageInput
is the input field for uploading images, textInput
is where the user types their message, previewImage
shows a preview of the selected image, and chatHistory
is the area where the conversation is displayed.
- Modal Elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");
These variables handle the modal functionality. The modal displays larger versions of images, with modal representing the modal container, modalImage
for the image inside the modal, and closeModal
for the button to close the modal.
- Image Preview
imageInput.addEventListener("change", function () {
const file = imageInput.files[0];
if (file) {
const reader = new FileReader();
reader.onload = function (e) {
previewImage.src = e.target.result; // Preview the image
};
reader.readAsDataURL(file); // Read the file as a data URL
}
});
When a user selects an image file using imageInput
, this event listener triggers. It uses a FileReader
to read the image file and set the previewImage
source to the result. This allows the user to see a preview of the image before sending it.
- Send Image and Text
btnSend.addEventListener("click", async function (e) {
e.preventDefault();
const text = textInput.value.trim();
const file = imageInput.files[0];
// Clear inputs
textInput.value = "";
previewImage.src = "";
imageInput.value = null;
// Append the image and message to the chat immediately on the UI
if (file) {
addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
} else if (text) {
addMessageToChatHistory(null, text, "userMessage");
}
// Send the image and text to the backend
if (file || text) {
const formData = new FormData();
if (file) formData.append("image", file);
if (text) formData.append("message", text);
try {
const response = await fetch('http://localhost:3001/api/upload', {
method: 'POST',
body: formData
});
if (!response.ok) {
throw new Error('Failed to send image to the server.');
}
const data = await response.json();
// Display the bot's message (response) based on the image
addMessageToChatHistory(null, data.reply, "botMessage");
} catch (error) {
console.error('Error sending image or text:', error);
addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
}
}
});
This code handles the click event of the btnSend
button. It prevents the default action, retrieves and clears the text and image inputs, then appends the message or image to the chat history immediately. It creates a FormData
object to hold the image and text, which is then sent to the server using the Fetch API. If the server responds successfully, it updates the chat history with the response from the server. If there’s an error, it logs the issue and shows an error message in the chat.
- Add Message to Chat History
function addMessageToChatHistory(imageSrc, text, className) {
const messageContainer = document.createElement("div");
messageContainer.classList.add(className);
if (imageSrc) {
const imageElement = document.createElement("img");
imageElement.src = imageSrc;
imageElement.classList.add("previewed-image");
messageContainer.appendChild(imageElement);
}
if (text) {
const textContainer = document.createElement("div");
textContainer.classList.add("text");
textContainer.textContent = text;
messageContainer.appendChild(textContainer);
}
chatHistory.appendChild(messageContainer);
chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}
This function dynamically adds a message to the chat history. It creates a div container for the message, which can include an image or text. The image is added if imageSrc
is provided, and text is added if text is provided. The chat history is updated with this new message, and the view automatically scrolls to show the latest message.
- Modal Handling
closeModal.addEventListener("click", function (e) {
e.preventDefault();
modal.style.display = "none";
});
window.addEventListener("click", function (e) {
if (e.target === modal) {
modal.style.display = "none";
}
});
These event listeners manage the modal for viewing larger images. Clicking the close button or outside the modal hides it by setting modal.style.display
to "none".
- Send Message on Enter Key Press
textInput.addEventListener("keydown", (e) => {
if (e.key === "Enter") {
btnSend.click();
}
});
This listener allows the user to send a message by pressing the Enter key, mimicking the action of clicking the send button.
const btnSend = document.getElementById("btnSend");
const imageInput = document.getElementById("imageInput");
const textInput = document.getElementById("textInput");
const previewImage = document.getElementById("previewImage");
const chatHistory = document.getElementById("chatHistory");
// Modal elements
const modal = document.getElementById("imageModal");
const modalImage = document.getElementById("modalImage");
const closeModal = document.querySelector(".close");
// Function to preview the image when selected
imageInput.addEventListener("change", function () {
const file = imageInput.files[0];
if (file) {
const reader = new FileReader();
reader.onload = function (e) {
previewImage.src = e.target.result; // Preview the image
};
reader.readAsDataURL(file); // Read the file as a data URL
}
});
// Function to send the image and text
btnSend.addEventListener("click", async function (e) {
e.preventDefault();
const text = textInput.value.trim();
const file = imageInput.files[0];
// Clear inputs
textInput.value = "";
previewImage.src = "";
imageInput.value = null;
// Append the image and message to the chat immediately on the UI
if (file) {
addMessageToChatHistory(URL.createObjectURL(file), text, "userMessage");
} else if (text) {
addMessageToChatHistory(null, text, "userMessage");
}
// Send the image and text to the backend
if (file || text) {
const formData = new FormData();
if (file) formData.append("image", file);
if (text) formData.append("message", text);
try {
const response = await fetch('http://localhost:3001/api/upload', {
method: 'POST',
body: formData
});
if (!response.ok) {
throw new Error('Failed to send image to the server.');
}
const data = await response.json();
// Display the bot's message (response) based on the image
addMessageToChatHistory(null, data.reply, "botMessage");
} catch (error) {
console.error('Error sending image or text:', error);
addMessageToChatHistory(null, 'Error sending data. Please try again.', "errorMessage");
}
}
});
function addMessageToChatHistory(imageSrc, text, className) {
const messageContainer = document.createElement("div");
messageContainer.classList.add(className);
if (imageSrc) {
const imageElement = document.createElement("img");
imageElement.src = imageSrc;
imageElement.classList.add("previewed-image");
messageContainer.appendChild(imageElement);
}
if (text) {
const textContainer = document.createElement("div");
textContainer.classList.add("text");
textContainer.textContent = text;
messageContainer.appendChild(textContainer);
}
chatHistory.appendChild(messageContainer);
chatHistory.scrollTop = chatHistory.scrollHeight; // Auto-scroll to the bottom of the chat
}
closeModal.addEventListener("click", function (e) {
e.preventDefault();
modal.style.display = "none";
});
window.addEventListener("click", function (e) {
if (e.target === modal) {
modal.style.display = "none";
}
});
textInput.addEventListener("keydown", (e) => {
if (e.key === "Enter") {
btnSend.click();
}
});
4.2 Backend (server)
- Importing dependencies
const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors"); // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");
These dependencies were explained above, where we were setting up our environment. You can scroll up to get their uses to better understand why we are using them.
- Configuring Environment Variables
dotenv.config();
const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);
- dotenv.config(): Loads environment variables from the
.env
file. - apiKey: Retrieves the API key from environment variables to authenticate requests to the Google Generative AI API.
- genAI: Initializes the
GoogleGenerativeAI
instance with the API key. fileManager: Initializes the
GoogleAIFileManager
instance with the same API key for handling file uploads.Setting Up AI Model and Configuration
const model = genAI.getGenerativeModel({
model: "gemini-1.5-pro",
});
const generationConfig = {
temperature: 1,
topP: 0.95,
topK: 64,
maxOutputTokens: 8192,
responseMimeType: "text/plain",
};
- model: Configures and initializes the generative model (Gemini 1.5 Pro) from Google AI, specifying which model to use for generating responses.
generationConfig: Defines parameters for generating responses, including temperature (controls creativity),
topP
andtopK
(control the diversity of responses), andmaxOutputTokens
(maximum length of the response).Configuring Multer for File Uploads
const storage = multer.diskStorage({
destination: (req, file, cb) => {
cb(null, 'uploads/'); // Define where the files should be stored
},
filename: (req, file, cb) => {
cb(null, Date.now() + '-' + file.originalname); // Rename the file to avoid duplicates
}
});
const upload = multer({ storage: storage });
- multer.diskStorage(): Configures how files are stored.
- destination: Specifies the directory (uploads/) where files should be saved.
- filename: Renames the file by appending the current timestamp to ensure uniqueness.
upload: Creates a Multer instance with the defined storage configuration.
Uploading Files to Gemini
async function uploadToGemini(path, mimeType) {
const uploadResult = await fileManager.uploadFile(path, {
mimeType,
displayName: path,
});
const file = uploadResult.file;
console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
return file;
}
- uploadToGemini(): A function to upload a file to the Google Gemini API.
- fileManager.uploadFile(): Uploads the file to the API and logs the result.
file: Contains the details of the uploaded file returned from the API.
Configuring Express and Middleware
const app = express();
const port = 3001;
app.use(cors()); // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));
// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));
- app: Initializes an Express application.
- port: Sets the port on which the server will listen (3001).
- app.use(cors()): Enables CORS for the server.
- app.use(bodyParser.json()): Parses JSON bodies.
- app.use(bodyParser.urlencoded({ extended: true })): Parses URL-- encoded bodies.
app.use(express.static("public")): Serves static files like HTML, CSS, and JS from the public directory.
API Endpoint for Handling Image and Text
app.post("/api/upload", upload.single("image"), async (req, res, next) => {
try {
const { message } = req.body;
const imagePath = req.file ? req.file.path : null;
let generatedText = "";
if (imagePath) {
const files = [
await uploadToGemini(imagePath, "image/jpeg"),
];
const chatSession = model.startChat({
generationConfig,
history: [
{
role: "user",
parts: [
{
fileData: {
mimeType: files[0].mimeType,
fileUri: files[0].uri,
},
},
],
},
],
});
const result = await chatSession.sendMessage(message);
res.status(200).json({ reply: result.response.text() });
next(message);
}
if (message) {
const chatSession = model.startChat({
generationConfig,
history: [],
});
const result = await chatSession.sendMessage(message);
const aiResponse = result.response.text();
res.status(200).json({ reply: aiResponse });
}
if (!imagePath && !message) {
return res.status(400).json({ error: "No image or text provided" });
}
res.json({ reply: generatedText });
} catch (error) {
console.error("Error processing image or text:", error);
res.status(500).json({ error: "Internal Server Error" });
}
});
- app.post("/api/upload"): Defines a POST endpoint for handling file and text uploads.
- upload.single("image"): Middleware to handle single file upload (named image).
- req.body: Contains the text message.
- req.file: Contains the uploaded image file.
- uploadToGemini(): Uploads the image to the Gemini API.
- model.startChat(): Starts a chat session with the model.
- chatSession.sendMessage(message): Sends the message (and image if provided) to the model.
- res.status(200).json({ reply: result.response.text() }): Sends the generated response back to the client.
- res.status(400): Handles cases where neither image nor text is provided.
res.status(500): Handles server errors.
Starting the Server
app.listen(port, () => {
console.log(`Server running at http://localhost:${port}`);
});
- app.listen(port): Starts the server and listens on the specified port (3001).
- console.log: Confirms that the server is running and accessible at http://localhost:3001.
Sure, let’s go through the API endpoint for handling image and text in detail, from start to finish, in paragraph form.
The endpoint for handling image and text uploads is defined with app.post("/api/upload", upload.single("image"), async (req, res, next) => { ... })
. This code sets up a POST request handler at the /api/upload
route. The upload.single("image")
middleware, provided by Multer, is used to handle file uploads. It processes a single file upload where the form field name is image
.
When a request is made to this endpoint, the middleware extracts the uploaded file from the request and saves it to a predefined location on the server. If a file is uploaded, its path can be accessed via req.file.path
, and any accompanying text sent in the form is available through req.body.message
.
The handler first checks if an image file was provided by examining req.file
. If an image is present, the server uploads this file to the Google Gemini API using the uploadToGemini()
function. This function takes the file path and MIME type as arguments, uploads the file, and returns a file object containing details like the file's URI. This uploaded file’s details are then used to create a new chat session with the generative AI model. The image is sent as part of the chat history to the model, which is configured with parameters defined in generationConfig
. The model processes the image and any accompanying message, generating a response. The response text is then sent back to the client with a status of 200, indicating successful processing.
If no image is provided but a text message is included, the server handles this by starting a new chat session with the AI model using just the text message. The text is sent to the AI model, and its response is sent back to the client in the same manner.
In cases where neither an image nor a text message is provided, the server responds with a 400 status code and an error message indicating that neither image nor text was provided.
If an error occurs during any part of the process, such as while uploading the file or communicating with the AI model, the server catches this error and responds with a 500 status code, indicating an internal server error. This approach ensures that the server robustly handles various scenarios involving text and image inputs, providing appropriate feedback and responses to the client.
const express = require("express");
const bodyParser = require("body-parser");
const cors = require("cors"); // Enable CORS
const dotenv = require("dotenv");
const multer = require("multer");
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { GoogleAIFileManager } = require("@google/generative-ai/server");
dotenv.config();
const apiKey = process.env.GEMINI_API_KEY;
const genAI = new GoogleGenerativeAI(apiKey);
const fileManager = new GoogleAIFileManager(apiKey);
const model = genAI.getGenerativeModel({
model: "gemini-1.5-pro",
});
const generationConfig = {
temperature: 1,
topP: 0.95,
topK: 64,
maxOutputTokens: 8192,
responseMimeType: "text/plain",
};
const app = express();
const port = 3001;
// Setup multer for file uploads
const storage = multer.diskStorage({
destination: (req, file, cb) => {
cb(null, 'uploads/'); // Define where the files should be stored
},
filename: (req, file, cb) => {
cb(null, Date.now() + '-' + file.originalname); // Rename the file to avoid duplicates
}
});
const upload = multer({ storage: storage });
/**
* Uploads the given file to Gemini.
*
* See https://ai.google.dev/gemini-api/docs/prompting_with_media
*/
async function uploadToGemini(path, mimeType) {
const uploadResult = await fileManager.uploadFile(path, {
mimeType,
displayName: path,
});
const file = uploadResult.file;
console.log(`Uploaded file ${file.displayName} as: ${file.name}`);
return file;
}
app.use(cors()); // Enable CORS
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true }));
// Serve static files (HTML, CSS, JS)
app.use(express.static("public"));
// API endpoint for generating a response
app.post("/api/upload", upload.single("image"), async (req, res, next) => {
// console.log("api/upload");
try {
const { message } = req.body; // This is the accompanying text (if any)
const imagePath = req.file ? req.file.path : null; // This is the image (if any)
let generatedText = "";
console.log("after generatedText " + generatedText);
// Check if an image is provided and send the image URL to the Gemini API
if (imagePath) {
console.log(imagePath);
const files = [
await uploadToGemini(imagePath, "image/jpeg"),
];
console.log(files);
const chatSession = model.startChat({
generationConfig,
history: [
{
role: "user",
parts: [
{
fileData: {
mimeType: files[0].mimeType,
fileUri: files[0].uri,
},
},
],
},
],
});
const result = await chatSession.sendMessage(message);
console.log(result.response.text() );
res.status(200).json({ reply: result.response.text() });
next(message);
// Extract the AI's response related to the image
// const imageAIResponse = imageResult?.candidates?.[0]?.content || "Could not analyze the image.";
// console.log("after imageAIResponse");
// generatedText += `Image Analysis: ${imageAIResponse}`;
// console.log("after generatedText " + generatedText);
}
// Handle the accompanying text and send it to the Gemini API
if (message) {
const chatSession = model.startChat({
generationConfig,
history: [],
});
const result = await chatSession.sendMessage(message);
console.log(result.response.candidates);
console.log(result.response.text());
// Extract the AI's generated response
const aiResponse = result.response.text()//result?.response.candidates?.[0]?.content || "Could not generate a response.";
res.status(200).json({ reply: aiResponse });
}
// If neither image nor text is provided, return an error
if (!imagePath && !message) {
return res.status(400).json({ error: "No image or text provided" });
}
// Send the generated text or description back to the frontend
res.json({ reply: generatedText });
} catch (error) {
console.error("Error processing image or text:", error);
res.status(500).json({ error: "Internal Server Error" });
}
});
// Start the server
app.listen(port, () => {
console.log(`Server running at http://localhost:${port}`);
});
Removing markdown
In the chat application, I used the Marked library to convert Markdown text into HTML for bot messages by including<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
. When the bot sends a message, the code checks the class name and appliestextContainer.innerHTML = marked.parse(text);
to render Markdown as HTML. For user messages, I usedtextContainer.textContent = text;
to display plain text, ensuring clarity in interactions.Markdown Text
- After markdown has been removed
5. Testing and Debugging
Testing and debugging are critical steps in ensuring the functionality and reliability of the chatbot application. Testing involves verifying that all features work as intended. This includes checking edge cases and error handling, like server failures or incorrect data submissions. Debugging focuses on identifying and fixing issues that arise during testing, ensuring smooth operation and user experience. Continuous testing and debugging help maintain the application's robustness and user satisfaction.
During the process of coming up with the chatbot, I faces many errors that had to be debugged for the chatbot to function well.
5.1 Text queries
5.2 Image queries
6. Problems faced and conclusions
6.1 Some difficulties Faced
I encountered several bugs during development. Some were minor, but others took significant time to resolve, especially debugging responses from the API.
Another issue was that it was my first time working with the Gemini API. The unfamiliarity with the API led to a learning curve, causing delays in progress.
I faced was electricity outages that delayed my work. This interrupted my workflow and extended the project timeline.
One other problem was that the API sometimes failed to give responses. After thorough troubleshooting and seeking help, I was able to resolve it.
Also, understanding the documentation for the API was challenging, as it required piecing together various concepts I hadn’t worked with before.
6.2 Conclusion
Working on this project taught me valuable lessons:
Gemini API Integration: I developed skills in API integration, particularly using the Gemini API for generating responses based on inputs.
Problem-solving: I learned how to systematically debug and troubleshoot issues, improving my resilience in overcoming project obstacles.
Time Management: The delays caused by power outages and bugs helped me practice time management and adaptability under pressure.
Collaborating for Solutions: Reaching out for help when needed and learning from others was an important takeaway in this project.
Practical Experience: The hands-on experience with API and front-end integration improved my proficiency in JavaScript.
Top comments (0)