Bhimashankar Patil

Posted on Nov 28, 2024

Building a Simple Chatbot with Llama2 [Chat with Excel]

#python #nlp #llm #datascience

In this post, I’ll explain how I built a chatbot using the Llama2 model to query Excel data intelligently.

What We’re Building

Loads an Excel file.
Splits the data into manageable chunks.
Stores the data in a vector database for fast retrieval.
Use a local Llama2 model to answer questions based on the content of the Excel file.

Prerequisites:

Python (≥ 3.8)
Libraries: langchain, pandas, unstructured, Chroma

Step 1: Install Dependencies

%pip install -q unstructured langchain
%pip install -q "unstructured[all-docs]"

Step 2: Load the Excel File

import pandas as pd

excel_path = "Book2.xlsx"
if excel_path:
    df = pd.read_excel(excel_path)
    data = df.to_string(index=False)
else:
    print("Upload an Excel file")

Step 3: Chunk the Data and Store in a Vector Database

Large text data is split into smaller, overlapping chunks for effective embedding and querying. These chunks are stored in a Chroma vector database.

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_text(data)

embedding_model = OllamaEmbeddings(model="nomic-embed-text", show_progress=False)
vector_db = Chroma.from_texts(
    texts=chunks, 
    embedding=embedding_model,
    collection_name="local-rag"
)

Step 4: Initialize the Llama2 Model

We use ChatOllama to load the Llama2 model locally.

from langchain_community.chat_models import ChatOllama

local_model = "llama2"
llm = ChatOllama(model=local_model)

Step 5: Create a Query Prompt

The chatbot will respond based on specific column names from the Excel file. We create a prompt template to guide the model

from langchain.prompts import PromptTemplate

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI assistant. Answer the user's questions based on the column names: 
    Id, order_id, name, sales, refund, and status. Original question: {question}"""
)

Step 6: Set Up the Retriever

We configure a retriever to fetch relevant chunks from the vector database, which will be used by the Llama2 model to answer questions.

from langchain.retrievers.multi_query import MultiQueryRetriever

retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

Step 7: Build the Response Chain

The response chain integrates:

A retriever to fetch context.
A prompt to format the question and context.
The Llama2 model to generate answers.
An output parser to format the response.

from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Step 8: Ask a Question

Now we’re ready to ask a question! Here’s how we invoke the chain to get a response:

raw_result = chain.invoke("How many rows are there?")
final_result = f"{raw_result}\n\nIf you have more questions, feel free to ask!"
print(final_result)

Sample Output

When I ran the above code on a sample Excel file, here’s what I got:

Based on the provided context, there are 10 rows in the table.
If you have more questions, feel free to ask!

Conclusion:

This approach leverages the power of embeddings and the Llama2 model to create a smart, interactive chatbot for Excel data. With some tweaks, you can extend this to work with other types of documents or integrate it into a full-fledged app!

Check working example with UI on my LinkedIn:

Introducing BChat Excel: A Conversational AI-Powered Tool for Excel File Interactions

Top comments (1)

Winzod AI • Nov 28 '24

Hey folks, came across this post and thought it might be helpful for you! Rag Evaluation Metrics.

DEV Community