Github repo and link to the live app at the end of the article
Let’s build a Web Application to chat with any website using the Exa Neural Search API and OpenAI GPT-3.5 turbo within a Streamlit front-end.
Creating the web retrieval function
As we want to RAG on the whole website, and not just a single page, this makes this step a bit tricky. The hack we came up with is using the Exa RAG API and to constrain it to only the domain of the website we want to chat with.
After getting a free API key on their website, we can build our retrieval function:
pip install exa-py loguru
from exa_py import Exa
from typing import Dict, List, Tuple
from loguru import logger
exa = Exa("EXA_API_KEY")
def get_text_chunks(
query: str,
url: str,
num_sentences: int = 15,
highlights_per_url: int = 5,
) -> Tuple[List[str], List[str]]:
"""
Return a lsit of text chunks from the given URL that are relevant to the query.
"""
highlights_options = {
"num_sentences": num_sentences, # how long our highlights should be
"highlights_per_url": highlights_per_url,
}
search_response = exa.search_and_contents(
query,
highlights=highlights_options,
num_results=10,
use_autoprompt=True,
include_domains=[url],
)
chunks = [sr.highlights[0] for sr in search_response.results]
url_sources = list(set([sr.url for sr in search_response.results]))
return chunks, url_sources
Using GPT-3.5 turbo to generate an answer with the content chunks
To generate an answer to the user query, we will pass to GPT-3.5 turbo the content chunks retrieved from the website as context.
We use Llama index RAG prompt template available here to build our prompt:
def generate_prompt_from_chuncks(chunks: List[str], query: str) -> str:
"""
Generate a prompt from the given chunks and question.
TODO: add a check on token lenght to avoid exceeding the max token length of the model.
"""
assert len(chunks) > 0, "Chunks should not be empty"
concatenated_chunks = ""
for chunck in chunks:
concatenated_chunks += chunck + "\n\n"
prompt = f"""
Context information is below.
---------------------
{concatenated_chunks}
---------------------
Given the context information and not prior knowledge, answer the query.
Do not start your answer with something like Based on the provided context information...
Query: {query}
Answer:
"""
return prompt
Using GPT-3.5 turbo to generate an answer with the content chunks
Then, we can invoke GPT-3.5 turbo on this prompt (optionally with the previous messages):
pip install openai
from openai import OpenAI
openai_client = OpenAI(api_key=config.OPENAI_API_KEY)
def invoke_llm(
prompt: str,
model_name: str = "gpt-3.5-turbo",
previous_messages: List[Dict[str, str]] = None,
) -> str:
"""
Invoke the language model with the given prompt and return the response.
"""
if previous_messages is None:
previous_messages = []
completion = openai_client.chat.completions.create(
model=model_name,
messages=[
{
"role": "system",
"content": "You are a helpful assistant replying to questions given a context.",
}
]
+ previous_messages
+ [
{"role": "user", "content": prompt},
],
temperature=0.0,
)
return completion.choices[0].message.content
Now, we have our answering function ready to go:
def query2answer(
query: str, url: str, session_messages: List[Dict[str, str]]
) -> Tuple[str, List[str]]:
"""
Given a query and an URL, return the answer to the query.
"""
try:
logger.info(f"Query: {query}")
chuncks, url_sources = get_text_chunks(query, url)
logger.info(f"Retrieved {len(chuncks)} chunks from {url}")
prompt = generate_prompt_from_chuncks(chuncks, query)
# TODO: add a check on token lenght to avoid exceeding the max token length of the model.
llm_answer = invoke_llm(prompt, previous_messages=session_messages)
logger.info(f"Answer: {llm_answer}")
except Exception as e:
logger.error(f"An error occurred: {e}")
llm_answer = "Sorry, I was not able to answer. Either you setup a wrong URL or the URL is too new."
url_sources = []
return llm_answer, url_sources
Using Streamlit to build the frontend
Using Streamlit, we can easily build a frontend in python for our agent:
pip install streamlit
import streamlit as st
from agent import query2answer
from urllib.parse import urlparse
import time
import config
# Initialize URL
# Check the query parameters for a URL
if "url" in st.query_params:
# Check it is note None
if st.query_params.url and st.query_params.url != "None":
st.session_state.url = st.query_params.url
if "url" not in st.session_state:
st.session_state.url = None
# Initialize chat history
if "messages" not in st.session_state:
st.session_state.messages = []
st.markdown(
"# 📖 url2chat - Chat with any website"
)
ROLE_TO_AVATAR = {
"user": "🦸♂️",
"assistant": "📖",
}
if st.session_state.url is None:
url = st.text_input("Enter the URL of a website to chat with it")
if url:
# Format checks
if not url.startswith("http"):
url = "https://" + url
# Parse the URL to only have the domain
o = urlparse(url)
domain = o.hostname
st.session_state.url = f"https://{domain}"
# Set the URL as a query parameter to trigger a rerun
st.query_params.url = f"https://{domain}"
# Trigger a rerun to start chatting
time.sleep(0.5)
st.rerun()
else:
# Add the URL as a query parameter (the rerun will remove it from the URL bar)
st.query_params.url = st.session_state.url
# Button to change the URL
col1, col2 = st.columns([1, 1])
with col1:
if st.button("Change URL", use_container_width=True):
st.session_state.url = None
st.query_params.pop("url", None)
st.session_state.messages = []
# We need to add a small delay, otherwise the query parameter is not removed before the rerun
time.sleep(0.5)
st.rerun()
with col2:
if st.button("Clear chat", use_container_width=True):
st.session_state.messages = []
st.rerun()
with st.chat_message("assistant", avatar=ROLE_TO_AVATAR["assistant"]):
st.markdown(f"You're chatting with {st.session_state.url}. Ask me anything! 📖")
# Display chat messages from history on app rerun
for message in st.session_state.messages:
with st.chat_message(message["role"], avatar=ROLE_TO_AVATAR[message["role"]]):
st.markdown(message["content"])
# Accept user input
if prompt := st.chat_input("What is this website about?"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message in chat message container
with st.chat_message("user", avatar=ROLE_TO_AVATAR["user"]):
st.markdown(prompt)
# Display assistant response in chat message container
chat_answer, url_sources = query2answer(
prompt, st.session_state.url, st.session_state.messages
)
with st.chat_message("assistant", avatar=ROLE_TO_AVATAR["assistant"]):
st.markdown(chat_answer)
# Display the sources in a hidden accordion container
with st.expander("Sources", expanded=False):
for source in url_sources:
st.markdown("- " + source)
st.session_state.messages.append({"role": "assistant", "content": chat_answer})
Adding text analytics to understand how our app performs and add user feed
Now that our app is working, let’s see how people are using it and how it is perfoming. To do so, we will use phospho, an open-source text analytics solution. In this example, we will use the free trial of the hosted version but you can self host it (see the github repo for more info on how to do so).
First we need to get our phospho project id and API key and to add it to our .streamlit/secrets.toml
file:
PHOSPHO_API_KEY=""
PHOSPHO_PROJECT_ID=""
pip install --upgrade phospho
Then, in our streamlit file, we can start logging messages:
import phospho
phospho.init()
# ...
phospho.log(input=prompt,
output=chat_answer,
metadata={"sources": url_sources},
)
phospho enable us to handle sessions. Let’s add a session support (see full file on github).
Let’s handle feedbacks from our users (you will need to have put in place the session id from above):
pip install streamlit_feedback
from streamlit_feedback import streamlit_feedback
# ...
# Add feedback button
def _submit_feedback(feedback: dict):
# Add a check if phospho is setup
if config.PHOSPHO_API_KEY and config.PHOSPHO_PROJECT_ID:
phospho.user_feedback(
task_id=phospho.latest_task_id,
raw_flag=feedback["score"],
notes=feedback["text"],
)
st.toast(f"Thank you for your feedback!")
else:
st.toast(f"phospho is not setup, feedback not sent.")
if len(st.session_state.messages) > 1:
feedback = streamlit_feedback(
feedback_type="thumbs",
optional_text_label="[Optional] Please provide an explanation",
on_submit=_submit_feedback,
# To create a new feedback component for every message and session, you need to provide a unique key
key=f"{st.session_state.session_id}_{len(st.session_state.messages)}",
)
Now, we can use phospho to detect some events of interest:
- When the assistant answer that it doesn’t have the information
- When the users wants to take an action (for instance buying a good or a service)
Conclusion
In this article, we’ve taken a deep dive into how to build a sophisticated web application, url2chat, that enables users to chat with any website. Leveraging the Exa Neural Search API, OpenAI GPT-3.5 turbo, and Streamlit, we created a system that extracts relevant information from entire websites, generates context-aware responses, and presents it all within a user-friendly interface.
Possible improvements
According to the data we collected using phospho, the user experinece on our app isn’t meeting our quality standard. Some possible improvements are:
- not using a RAG search API but passing the whole website in the LLM context window (only suitable for small websites)
- use the sitemap to find pages relevant to the query, and then pass these pages to the LLM
Want to test it?
Clone the Github repo here and run it locally or use the version deployed on Streamlit Community Cloud here.
Top comments (2)
Looks really cool !
Open source perplexity ?