Akmal Chaudhri for SingleStore

Posted on Jan 21

Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LangChain

#singlestoredb #mongodb #langchain #vectordatabase

Abstract

In a previous article, we explored how to use LangChain's MongoDBAtlasVectorSearch with SingleStore Kai through a simple example. In this article, we'll look at another example, this time based on the LangChain GitHub repo. The results once again demonstrate that MongoDBAtlasVectorSearch works well with SingleStore Kai.

The notebook file used in this article is available on GitHub.

Introduction

As part of our ongoing evaluation of SingleStore Kai with different AI frameworks, this article uses another example to test LangChain's MongoDBAtlasVectorSearch with SingleStore Kai.

Create a SingleStore Cloud account

A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Standard Tier and take the default names for the Workspace Group and Workspace. We'll also enable SingleStore Kai.

We'll store our OpenAI API Key in the secrets vault using OPENAI_API_KEY.

Import the notebook

We'll download the notebook from GitHub.

From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio.

In the top right of the web page, we'll select New Notebook > Import From File. We'll use the wizard to locate and import the notebook we downloaded from GitHub.

Run the notebook

The notebook is adapted from the LangChain GitHub repo.

First, we'll ensure that the OpenAI API Key is available and set the embedding model:

os.environ["OPENAI_API_KEY"] = get_secret("OPENAI_API_KEY")

embeddings = OpenAIEmbeddings(
    model = "text-embedding-3-small"
)

Next, we'll get the connection to Kai using connection_url_kai, which is an environment variable that already points to the Kai instance, and set the database, collection and vector index names, as follows:

client = MongoClient(connection_url_kai)

DB_NAME = "langchain_test_db"
COLLECTION_NAME = "langchain_test_vectorstores"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "langchain_test_index_vectorstores"

MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]

vector_store = MongoDBAtlasVectorSearch(
    collection = MONGODB_COLLECTION,
    embedding = embeddings,
    index_name = ATLAS_VECTOR_SEARCH_INDEX_NAME,
    relevance_score_fn = "dotProduct",
)

Now, we'll create the vector index:

MONGODB_COLLECTION.create_index(
    [("embedding", "vector")],
    name = ATLAS_VECTOR_SEARCH_INDEX_NAME,
    kaiIndexOptions = {
        "index_type": "AUTO",
        "metric_type": "DOT_PRODUCT",
        "dimensions": 1536
    }
)

Next, we'll prepare and load the documents into SingleStore Kai:

document_1 = Document(
    page_content = "I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata = {"source": "tweet"},
)

document_2 = Document(
    page_content = "The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata = {"source": "news"},
)

document_3 = Document(
    page_content = "Building an exciting new project with LangChain - come check it out!",
    metadata = {"source": "tweet"},
)

document_4 = Document(
    page_content = "Robbers broke into the city bank and stole $1 million in cash.",
    metadata = {"source": "news"},
)

document_5 = Document(
    page_content = "Wow! That was an amazing movie. I can't wait to see it again.",
    metadata = {"source": "tweet"},
)

document_6 = Document(
    page_content = "Is the new iPhone worth the price? Read this review to find out.",
    metadata = {"source": "website"},
)

document_7 = Document(
    page_content = "The top 10 soccer players in the world right now.",
    metadata = {"source": "website"},
)

document_8 = Document(
    page_content = "LangGraph is the best framework for building stateful, agentic applications!",
    metadata = {"source": "tweet"},
)

document_9 = Document(
    page_content = "The stock market is down 500 points today due to fears of a recession.",
    metadata = {"source": "news"},
)

document_10 = Document(
    page_content = "I have a bad feeling I am going to get deleted :(",
    metadata = {"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents = documents, ids = uuids)

Example output:

['74beaaeb-897f-417a-aa09-f0b171859275',
 'fb22674d-85bc-454e-a95a-3cca20cd4b5d',
 'c474e923-a4ee-4258-890c-95882571dd8c',
 'd1d19d5c-518b-4d60-98e7-c6b0d2621efa',
 '895e61dd-4262-4f11-b174-8f04ed9fe443',
 '6ce2cae1-9877-4fc1-a1cf-2df3dc7910d5',
 '2ee33b04-c161-4b0f-9a87-fb1c803e028d',
 'e476495d-6812-48cb-92aa-381efc23f76c',
 '6bd53c68-e97c-4dbd-a0c2-7bb20221a16b',
 '3dc32b0d-417c-45fd-82ce-85e0aba15c5e']

We'll test document deletion:

vector_store.delete(ids = [uuids[-1]])

Example output:

True

We'll now test similarity_search:

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy", k = 2
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

Example output:

* Building an exciting new project with LangChain - come check it out! [{'_id': 'c474e923-a4ee-4258-890c-95882571dd8c', 'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'_id': 'e476495d-6812-48cb-92aa-381efc23f76c', 'source': 'tweet'}]

and similarity_search_with_score:

results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k = 1)
for res, score in results:
    print(f"* [SIM = {score:3f}] {res.page_content} [{res.metadata}]")

Example output:

* [SIM = 0.569169] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'_id': 'fb22674d-85bc-454e-a95a-3cca20cd4b5d', 'source': 'news'}]

and, finally, as_retriever:

retriever = vector_store.as_retriever(
    search_type = "similarity_score_threshold",
    search_kwargs = {"k": 1, "score_threshold": 0.2},
)
retriever.invoke("Stealing from the bank is a crime")

Example output:

[Document(metadata={'_id': 'd1d19d5c-518b-4d60-98e7-c6b0d2621efa', 'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

Summary

The query outputs consistently demonstrate reliable and effective performance when using MongoDBAtlasVectorSearch with SingleStore Kai, highlighting the compatibility and seamless integration between the two.

DEV Community

Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LangChain

Abstract

Introduction

Create a SingleStore Cloud account

Import the notebook

Run the notebook

Summary

Top comments (0)

Read next

How to Create Secure and Scalable Web Applications with MERN

Crash Course on Developing AI Applications with LangChain

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

What are the performance matrices used in MongoDB Query Performance Troubleshooting?