Abstract
In a previous article, we explored how to use LangChain's MongoDBAtlasVectorSearch
with SingleStore Kai through a simple example. In this article, we'll look at another example, this time based on the LangChain GitHub repo. The results once again demonstrate that MongoDBAtlasVectorSearch
works well with SingleStore Kai.
The notebook file used in this article is available on GitHub.
Introduction
As part of our ongoing evaluation of SingleStore Kai with different AI frameworks, this article uses another example to test LangChain's MongoDBAtlasVectorSearch
with SingleStore Kai.
Create a SingleStore Cloud account
A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Standard Tier and take the default names for the Workspace Group and Workspace. We'll also enable SingleStore Kai.
We'll store our OpenAI API Key in the secrets vault using OPENAI_API_KEY
.
Import the notebook
We'll download the notebook from GitHub.
From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio.
In the top right of the web page, we'll select New Notebook > Import From File. We'll use the wizard to locate and import the notebook we downloaded from GitHub.
Run the notebook
The notebook is adapted from the LangChain GitHub repo.
First, we'll ensure that the OpenAI API Key is available and set the embedding model:
os.environ["OPENAI_API_KEY"] = get_secret("OPENAI_API_KEY")
embeddings = OpenAIEmbeddings(
model = "text-embedding-3-small"
)
Next, we'll get the connection to Kai using connection_url_kai
, which is an environment variable that already points to the Kai instance, and set the database, collection and vector index names, as follows:
client = MongoClient(connection_url_kai)
DB_NAME = "langchain_test_db"
COLLECTION_NAME = "langchain_test_vectorstores"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "langchain_test_index_vectorstores"
MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]
vector_store = MongoDBAtlasVectorSearch(
collection = MONGODB_COLLECTION,
embedding = embeddings,
index_name = ATLAS_VECTOR_SEARCH_INDEX_NAME,
relevance_score_fn = "dotProduct",
)
Now, we'll create the vector index:
MONGODB_COLLECTION.create_index(
[("embedding", "vector")],
name = ATLAS_VECTOR_SEARCH_INDEX_NAME,
kaiIndexOptions = {
"index_type": "AUTO",
"metric_type": "DOT_PRODUCT",
"dimensions": 1536
}
)
Next, we'll prepare and load the documents into SingleStore Kai:
document_1 = Document(
page_content = "I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
metadata = {"source": "tweet"},
)
document_2 = Document(
page_content = "The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
metadata = {"source": "news"},
)
document_3 = Document(
page_content = "Building an exciting new project with LangChain - come check it out!",
metadata = {"source": "tweet"},
)
document_4 = Document(
page_content = "Robbers broke into the city bank and stole $1 million in cash.",
metadata = {"source": "news"},
)
document_5 = Document(
page_content = "Wow! That was an amazing movie. I can't wait to see it again.",
metadata = {"source": "tweet"},
)
document_6 = Document(
page_content = "Is the new iPhone worth the price? Read this review to find out.",
metadata = {"source": "website"},
)
document_7 = Document(
page_content = "The top 10 soccer players in the world right now.",
metadata = {"source": "website"},
)
document_8 = Document(
page_content = "LangGraph is the best framework for building stateful, agentic applications!",
metadata = {"source": "tweet"},
)
document_9 = Document(
page_content = "The stock market is down 500 points today due to fears of a recession.",
metadata = {"source": "news"},
)
document_10 = Document(
page_content = "I have a bad feeling I am going to get deleted :(",
metadata = {"source": "tweet"},
)
documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
document_6,
document_7,
document_8,
document_9,
document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]
vector_store.add_documents(documents = documents, ids = uuids)
Example output:
['74beaaeb-897f-417a-aa09-f0b171859275',
'fb22674d-85bc-454e-a95a-3cca20cd4b5d',
'c474e923-a4ee-4258-890c-95882571dd8c',
'd1d19d5c-518b-4d60-98e7-c6b0d2621efa',
'895e61dd-4262-4f11-b174-8f04ed9fe443',
'6ce2cae1-9877-4fc1-a1cf-2df3dc7910d5',
'2ee33b04-c161-4b0f-9a87-fb1c803e028d',
'e476495d-6812-48cb-92aa-381efc23f76c',
'6bd53c68-e97c-4dbd-a0c2-7bb20221a16b',
'3dc32b0d-417c-45fd-82ce-85e0aba15c5e']
We'll test document deletion:
vector_store.delete(ids = [uuids[-1]])
Example output:
True
We'll now test similarity_search
:
results = vector_store.similarity_search(
"LangChain provides abstractions to make working with LLMs easy", k = 2
)
for res in results:
print(f"* {res.page_content} [{res.metadata}]")
Example output:
* Building an exciting new project with LangChain - come check it out! [{'_id': 'c474e923-a4ee-4258-890c-95882571dd8c', 'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'_id': 'e476495d-6812-48cb-92aa-381efc23f76c', 'source': 'tweet'}]
and similarity_search_with_score
:
results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k = 1)
for res, score in results:
print(f"* [SIM = {score:3f}] {res.page_content} [{res.metadata}]")
Example output:
* [SIM = 0.569169] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'_id': 'fb22674d-85bc-454e-a95a-3cca20cd4b5d', 'source': 'news'}]
and, finally, as_retriever
:
retriever = vector_store.as_retriever(
search_type = "similarity_score_threshold",
search_kwargs = {"k": 1, "score_threshold": 0.2},
)
retriever.invoke("Stealing from the bank is a crime")
Example output:
[Document(metadata={'_id': 'd1d19d5c-518b-4d60-98e7-c6b0d2621efa', 'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]
Summary
The query outputs consistently demonstrate reliable and effective performance when using MongoDBAtlasVectorSearch
with SingleStore Kai, highlighting the compatibility and seamless integration between the two.
Top comments (0)