Abstract
This article demonstrates how to integrate LlamaIndex's MongoDBAtlasVectorSearch
with SingleStore Kai, a MongoDB-compatible API. It highlights a simple test showing that LlamaIndex and Kai work seamlessly together, providing a foundation for AI-driven applications.
The notebook file used in this article is available on GitHub.
Introduction
SingleStore Kai is a MongoDB-compatible API powered by SingleStore's distributed database engine, enabling developers to integrate MongoDB workflows with a scalable database system. LlamaIndex supports MongoDBAtlasVectorSearch
for advanced search and retrieval. This article tests MongoDBAtlasVectorSearch
with SingleStore Kai.
Create a SingleStore Cloud account
A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Standard Tier and take the default names for the Workspace Group and Workspace. We'll also enable SingleStore Kai.
We'll store our OpenAI API Key in the secrets vault using OPENAI_API_KEY
.
Import the notebook
We'll download the notebook from GitHub.
From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio.
In the top right of the web page, we'll select New Notebook > Import From File. We'll use the wizard to locate and import the notebook we downloaded from GitHub.
Run the notebook
The notebook is adapted from the LlamaIndex GitHub repo.
We'll first download some data to use:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
Next, we'll ensure that the OpenAI API Key is available and define the LLM and embedding models, as follows:
os.environ["OPENAI_API_KEY"] = get_secret("OPENAI_API_KEY")
llm = OpenAI(
model = "gpt-4o-mini"
)
embed_model = OpenAIEmbedding(
model = "text-embedding-3-small"
)
We'll get the connection to Kai using connection_url_kai
, which is an environment variable that already points to the Kai instance, set the database and collection names, create the vector index and store the previously downloaded data, as follows:
kai_client = pymongo.MongoClient(connection_url_kai)
db = kai_client["default_db"]
collection = db["default_collection"]
collection.create_index(
[("embedding", "vector")],
name = "vector_index",
kaiIndexOptions = {
"index_type": "AUTO",
"metric_type": "DOT_PRODUCT",
"dimensions": 1536
}
)
store = MongoDBAtlasVectorSearch(kai_client)
storage_context = StorageContext.from_defaults(vector_store = store)
uber_docs = SimpleDirectoryReader(
input_files = ["./data/10k/uber_2021.pdf"]
).load_data()
index = VectorStoreIndex.from_documents(
uber_docs, storage_context = storage_context, embed_model = embed_model
)
We'll now ask a question:
response = index.as_query_engine(llm = llm).query("What was Uber's revenue?")
display(Markdown(f"<b>{response}</b>"))
Example output:
Uber's revenue for the year ended December 31, 2021, was $17.455 billion.
The following code checks how many documents are stored in the database, identifies a specific document using its ID (ref_doc_id
), and then deletes it. Before and after each step, it prints the number of documents in the database to show the changes.
print(store._collection.count_documents({}))
typed_response = (
response if isinstance(response, Response) else response.get_response()
)
ref_doc_id = typed_response.source_nodes[0].node.ref_doc_id
print(store._collection.count_documents({"metadata.ref_doc_id": ref_doc_id}))
if ref_doc_id:
store.delete(ref_doc_id)
print(store._collection.count_documents({}))
Example output:
395
1
394
Summary
The results from this quick test appear promising. Further tests are needed to determine the complete level of compatibility between LlamaIndex and Kai.
Top comments (0)