DEV Community

Cover image for Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LlamaIndex
Akmal Chaudhri for SingleStore

Posted on

Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LlamaIndex

Abstract

This article demonstrates how to integrate LlamaIndex's MongoDBAtlasVectorSearch with SingleStore Kai, a MongoDB-compatible API. It highlights a simple test showing that LlamaIndex and Kai work seamlessly together, providing a foundation for AI-driven applications.

The notebook file used in this article is available on GitHub.

Introduction

SingleStore Kai is a MongoDB-compatible API powered by SingleStore's distributed database engine, enabling developers to integrate MongoDB workflows with a scalable database system. LlamaIndex supports MongoDBAtlasVectorSearch for advanced search and retrieval. This article tests MongoDBAtlasVectorSearch with SingleStore Kai.

Create a SingleStore Cloud account

A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Standard Tier and take the default names for the Workspace Group and Workspace. We'll also enable SingleStore Kai.

We'll store our OpenAI API Key in the secrets vault using OPENAI_API_KEY.

Import the notebook

We'll download the notebook from GitHub.

From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio.

In the top right of the web page, we'll select New Notebook > Import From File. We'll use the wizard to locate and import the notebook we downloaded from GitHub.

Run the notebook

The notebook is adapted from the LlamaIndex GitHub repo.

We'll first download some data to use:

!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
Enter fullscreen mode Exit fullscreen mode

Next, we'll ensure that the OpenAI API Key is available and define the LLM and embedding models, as follows:

os.environ["OPENAI_API_KEY"] = get_secret("OPENAI_API_KEY")

llm = OpenAI(
    model = "gpt-4o-mini"
)

embed_model = OpenAIEmbedding(
    model = "text-embedding-3-small"
)
Enter fullscreen mode Exit fullscreen mode

We'll get the connection to Kai using connection_url_kai, which is an environment variable that already points to the Kai instance, set the database and collection names, create the vector index and store the previously downloaded data, as follows:

kai_client = pymongo.MongoClient(connection_url_kai)
db = kai_client["default_db"]
collection = db["default_collection"]

collection.create_index(
    [("embedding", "vector")],
    name = "vector_index",
    kaiIndexOptions = {
        "index_type": "AUTO",
        "metric_type": "DOT_PRODUCT",
        "dimensions": 1536
    }
)

store = MongoDBAtlasVectorSearch(kai_client)

storage_context = StorageContext.from_defaults(vector_store = store)
uber_docs = SimpleDirectoryReader(
    input_files = ["./data/10k/uber_2021.pdf"]
).load_data()
index = VectorStoreIndex.from_documents(
    uber_docs, storage_context = storage_context, embed_model = embed_model
)
Enter fullscreen mode Exit fullscreen mode

We'll now ask a question:

response = index.as_query_engine(llm = llm).query("What was Uber's revenue?")
display(Markdown(f"<b>{response}</b>"))
Enter fullscreen mode Exit fullscreen mode

Example output:

Uber's revenue for the year ended December 31, 2021, was $17.455 billion.
Enter fullscreen mode Exit fullscreen mode

The following code checks how many documents are stored in the database, identifies a specific document using its ID (ref_doc_id), and then deletes it. Before and after each step, it prints the number of documents in the database to show the changes.

print(store._collection.count_documents({}))

typed_response = (
    response if isinstance(response, Response) else response.get_response()
)
ref_doc_id = typed_response.source_nodes[0].node.ref_doc_id
print(store._collection.count_documents({"metadata.ref_doc_id": ref_doc_id}))

if ref_doc_id:
    store.delete(ref_doc_id)
    print(store._collection.count_documents({}))
Enter fullscreen mode Exit fullscreen mode

Example output:

395
1
394
Enter fullscreen mode Exit fullscreen mode

Summary

The results from this quick test appear promising. Further tests are needed to determine the complete level of compatibility between LlamaIndex and Kai.

Top comments (0)