Akmal Chaudhri for SingleStore

Posted on Jan 21

Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LlamaIndex

#singlestoredb #mongodb #llamaindex #vectordatabase

Abstract

This article demonstrates how to integrate LlamaIndex's MongoDBAtlasVectorSearch with SingleStore Kai, a MongoDB-compatible API. It highlights a simple test showing that LlamaIndex and Kai work seamlessly together, providing a foundation for AI-driven applications.

The notebook file used in this article is available on GitHub.

Introduction

SingleStore Kai is a MongoDB-compatible API powered by SingleStore's distributed database engine, enabling developers to integrate MongoDB workflows with a scalable database system. LlamaIndex supports MongoDBAtlasVectorSearch for advanced search and retrieval. This article tests MongoDBAtlasVectorSearch with SingleStore Kai.

Create a SingleStore Cloud account

A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Standard Tier and take the default names for the Workspace Group and Workspace. We'll also enable SingleStore Kai.

We'll store our OpenAI API Key in the secrets vault using OPENAI_API_KEY.

Import the notebook

We'll download the notebook from GitHub.

From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio.

In the top right of the web page, we'll select New Notebook > Import From File. We'll use the wizard to locate and import the notebook we downloaded from GitHub.

Run the notebook

The notebook is adapted from the LlamaIndex GitHub repo.

We'll first download some data to use:

!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'

Next, we'll ensure that the OpenAI API Key is available and define the LLM and embedding models, as follows:

os.environ["OPENAI_API_KEY"] = get_secret("OPENAI_API_KEY")

llm = OpenAI(
    model = "gpt-4o-mini"
)

embed_model = OpenAIEmbedding(
    model = "text-embedding-3-small"
)

We'll get the connection to Kai using connection_url_kai, which is an environment variable that already points to the Kai instance, set the database and collection names, create the vector index and store the previously downloaded data, as follows:

kai_client = pymongo.MongoClient(connection_url_kai)
db = kai_client["default_db"]
collection = db["default_collection"]

collection.create_index(
    [("embedding", "vector")],
    name = "vector_index",
    kaiIndexOptions = {
        "index_type": "AUTO",
        "metric_type": "DOT_PRODUCT",
        "dimensions": 1536
    }
)

store = MongoDBAtlasVectorSearch(kai_client)

storage_context = StorageContext.from_defaults(vector_store = store)
uber_docs = SimpleDirectoryReader(
    input_files = ["./data/10k/uber_2021.pdf"]
).load_data()
index = VectorStoreIndex.from_documents(
    uber_docs, storage_context = storage_context, embed_model = embed_model
)

We'll now ask a question:

response = index.as_query_engine(llm = llm).query("What was Uber's revenue?")
display(Markdown(f"<b>{response}</b>"))

Example output:

Uber's revenue for the year ended December 31, 2021, was $17.455 billion.

The following code checks how many documents are stored in the database, identifies a specific document using its ID (ref_doc_id), and then deletes it. Before and after each step, it prints the number of documents in the database to show the changes.

print(store._collection.count_documents({}))

typed_response = (
    response if isinstance(response, Response) else response.get_response()
)
ref_doc_id = typed_response.source_nodes[0].node.ref_doc_id
print(store._collection.count_documents({"metadata.ref_doc_id": ref_doc_id}))

if ref_doc_id:
    store.delete(ref_doc_id)
    print(store._collection.count_documents({}))

Example output:

395
1
394

Summary

The results from this quick test appear promising. Further tests are needed to determine the complete level of compatibility between LlamaIndex and Kai.

DEV Community

Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LlamaIndex

Abstract

Introduction

Create a SingleStore Cloud account

Import the notebook

Run the notebook

Summary

Top comments (0)

Read next

Windows 11 Virtual Machine Set Up on Microsoft Azure

Introduction to Terraform: Infra as Code in Practice

The Quantum God: The Fractality of Cognition and the Boundaries of Consciousness

Alternative to Ruby's Monkey Patching