DEV Community

Cover image for Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LangChain
Akmal Chaudhri for SingleStore

Posted on

Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LangChain

Abstract

In a previous article, we explored how to use LangChain's MongoDBAtlasVectorSearch with SingleStore Kai through a simple example. In this article, we'll look at another example, this time based on the LangChain GitHub repo. The results once again demonstrate that MongoDBAtlasVectorSearch works well with SingleStore Kai.

The notebook file used in this article is available on GitHub.

Introduction

As part of our ongoing evaluation of SingleStore Kai with different AI frameworks, this article uses another example to test LangChain's MongoDBAtlasVectorSearch with SingleStore Kai.

Create a SingleStore Cloud account

A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Standard Tier and take the default names for the Workspace Group and Workspace. We'll also enable SingleStore Kai.

We'll store our OpenAI API Key in the secrets vault using OPENAI_API_KEY.

Import the notebook

We'll download the notebook from GitHub.

From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio.

In the top right of the web page, we'll select New Notebook > Import From File. We'll use the wizard to locate and import the notebook we downloaded from GitHub.

Run the notebook

The notebook is adapted from the LangChain GitHub repo.

First, we'll ensure that the OpenAI API Key is available and set the embedding model:

os.environ["OPENAI_API_KEY"] = get_secret("OPENAI_API_KEY")

embeddings = OpenAIEmbeddings(
    model = "text-embedding-3-small"
)
Enter fullscreen mode Exit fullscreen mode

Next, we'll get the connection to Kai using connection_url_kai, which is an environment variable that already points to the Kai instance, and set the database, collection and vector index names, as follows:

client = MongoClient(connection_url_kai)

DB_NAME = "langchain_test_db"
COLLECTION_NAME = "langchain_test_vectorstores"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "langchain_test_index_vectorstores"

MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]

vector_store = MongoDBAtlasVectorSearch(
    collection = MONGODB_COLLECTION,
    embedding = embeddings,
    index_name = ATLAS_VECTOR_SEARCH_INDEX_NAME,
    relevance_score_fn = "dotProduct",
)
Enter fullscreen mode Exit fullscreen mode

Now, we'll create the vector index:

MONGODB_COLLECTION.create_index(
    [("embedding", "vector")],
    name = ATLAS_VECTOR_SEARCH_INDEX_NAME,
    kaiIndexOptions = {
        "index_type": "AUTO",
        "metric_type": "DOT_PRODUCT",
        "dimensions": 1536
    }
)
Enter fullscreen mode Exit fullscreen mode

Next, we'll prepare and load the documents into SingleStore Kai:

document_1 = Document(
    page_content = "I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata = {"source": "tweet"},
)

document_2 = Document(
    page_content = "The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata = {"source": "news"},
)

document_3 = Document(
    page_content = "Building an exciting new project with LangChain - come check it out!",
    metadata = {"source": "tweet"},
)

document_4 = Document(
    page_content = "Robbers broke into the city bank and stole $1 million in cash.",
    metadata = {"source": "news"},
)

document_5 = Document(
    page_content = "Wow! That was an amazing movie. I can't wait to see it again.",
    metadata = {"source": "tweet"},
)

document_6 = Document(
    page_content = "Is the new iPhone worth the price? Read this review to find out.",
    metadata = {"source": "website"},
)

document_7 = Document(
    page_content = "The top 10 soccer players in the world right now.",
    metadata = {"source": "website"},
)

document_8 = Document(
    page_content = "LangGraph is the best framework for building stateful, agentic applications!",
    metadata = {"source": "tweet"},
)

document_9 = Document(
    page_content = "The stock market is down 500 points today due to fears of a recession.",
    metadata = {"source": "news"},
)

document_10 = Document(
    page_content = "I have a bad feeling I am going to get deleted :(",
    metadata = {"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents = documents, ids = uuids)
Enter fullscreen mode Exit fullscreen mode

Example output:

['74beaaeb-897f-417a-aa09-f0b171859275',
 'fb22674d-85bc-454e-a95a-3cca20cd4b5d',
 'c474e923-a4ee-4258-890c-95882571dd8c',
 'd1d19d5c-518b-4d60-98e7-c6b0d2621efa',
 '895e61dd-4262-4f11-b174-8f04ed9fe443',
 '6ce2cae1-9877-4fc1-a1cf-2df3dc7910d5',
 '2ee33b04-c161-4b0f-9a87-fb1c803e028d',
 'e476495d-6812-48cb-92aa-381efc23f76c',
 '6bd53c68-e97c-4dbd-a0c2-7bb20221a16b',
 '3dc32b0d-417c-45fd-82ce-85e0aba15c5e']
Enter fullscreen mode Exit fullscreen mode

We'll test document deletion:

vector_store.delete(ids = [uuids[-1]])
Enter fullscreen mode Exit fullscreen mode

Example output:

True
Enter fullscreen mode Exit fullscreen mode

We'll now test similarity_search:

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy", k = 2
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")
Enter fullscreen mode Exit fullscreen mode

Example output:

* Building an exciting new project with LangChain - come check it out! [{'_id': 'c474e923-a4ee-4258-890c-95882571dd8c', 'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'_id': 'e476495d-6812-48cb-92aa-381efc23f76c', 'source': 'tweet'}]
Enter fullscreen mode Exit fullscreen mode

and similarity_search_with_score:

results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k = 1)
for res, score in results:
    print(f"* [SIM = {score:3f}] {res.page_content} [{res.metadata}]")
Enter fullscreen mode Exit fullscreen mode

Example output:

* [SIM = 0.569169] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'_id': 'fb22674d-85bc-454e-a95a-3cca20cd4b5d', 'source': 'news'}]
Enter fullscreen mode Exit fullscreen mode

and, finally, as_retriever:

retriever = vector_store.as_retriever(
    search_type = "similarity_score_threshold",
    search_kwargs = {"k": 1, "score_threshold": 0.2},
)
retriever.invoke("Stealing from the bank is a crime")
Enter fullscreen mode Exit fullscreen mode

Example output:

[Document(metadata={'_id': 'd1d19d5c-518b-4d60-98e7-c6b0d2621efa', 'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]
Enter fullscreen mode Exit fullscreen mode

Summary

The query outputs consistently demonstrate reliable and effective performance when using MongoDBAtlasVectorSearch with SingleStore Kai, highlighting the compatibility and seamless integration between the two.

Top comments (0)