Abstract
This short article explores the integration of LangChain's MongoDBAtlasVectorSearch
with SingleStore Kai, a MongoDB-compatible API offered by SingleStore. While LangChain already supports SingleStore, the advent of Kai enables developers to use MongoDB-based workflows with a high-performance, scalable database system. Through a quick test, this article shows that LangChain and Kai can work together, offering the potential for building AI-powered applications.
The notebook file used in this article is available on GitHub.
Introduction
SingleStore Kai is a MongoDB-compatible API built on SingleStore's high-performance engine, designed to help developers integrate MongoDB workflows with a scale-out distributed database system. LangChain, a widely used framework for AI-powered application development, includes support for MongoDBAtlasVectorSearch
, which enables advanced search and retrieval capabilities. In this article, we'll perform a quick test to evaluate how LangChain's MongoDBAtlasVectorSearch
works with SingleStore Kai.
Create a SingleStore Cloud account
A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Standard Tier and take the default names for the Workspace Group and Workspace. We'll also enable SingleStore Kai.
We'll store our OpenAI API Key in the secrets vault using OPENAI_API_KEY
.
Import the notebook
We'll download the notebook from GitHub.
From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio.
In the top right of the web page, we'll select New Notebook > Import From File. We'll use the wizard to locate and import the notebook we downloaded from GitHub.
Run the notebook
We'll first create the client, database and collection, as follows:
kai_client = pymongo.MongoClient(connection_url_kai)
db = kai_client["langchain_demo"]
collection = db["langchain_docs"]
The connection_url_kai
is an environment variable that already points to the Kai instance.
Next, we'll ensure that the OpenAI API Key is available:
os.environ["OPENAI_API_KEY"] = get_secret("OPENAI_API_KEY")
We'll now prepare some documents, using the examples from the Ollama website. We'll also set the OpenAI embedding model and determine the length of the vector embeddings.
documents = [
"Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
"Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
"Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
"Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
"Llamas are vegetarians and have very efficient digestive systems",
"Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]
embeddings = OpenAIEmbeddings(
model = "text-embedding-3-small"
)
dimensions = len(embeddings.embed_query(documents[0]))
docs = [Document(text) for text in documents]
Now we'll create the vector index, as follows:
collection.create_index(
[("embedding", "vector")],
name = "vector_index",
kaiIndexOptions = {
"index_type": "AUTO",
"metric_type": "DOT_PRODUCT",
"dimensions": dimensions
}
)
Next, we'll store the documents and embeddings in SingleStore:
docsearch = MongoDBAtlasVectorSearch.from_documents(
docs,
embeddings,
collection = collection,
index_name = "vector_index"
)
We'll now ask a question:
prompt = "What animals are llamas related to?"
docs = docsearch.similarity_search(prompt)
data = docs[0].page_content
print(data)
Example output:
Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels
Next, we'll use an LLM, as follows:
openai_client = OpenAI()
response = openai_client.chat.completions.create(
model = "gpt-4o-mini",
messages = [
{"role": "system", "content": "You are a helpful assistant. Provide more details."},
{"role": "user", "content": f"Using this data: {data}. Respond to this prompt: {prompt}"}
]
)
print(response.choices[0].message.content)
Example output:
Llamas are related to several animals within the camelid family. The closest relatives of llamas include:
1. **Vicuñas** – These are wild South American camelids that are similar in appearance to llamas but are smaller and are known for their fine wool.
2. **Alpacas** – Alpacas are domesticated camelids closely related to llamas, and they are primarily bred for their soft and luxurious fleece.
3. **Guanacos** – Guanacos are wild relatives of llamas and are also native to South America. They are similar in build and habitat preference.
4. **Camels** – Though they are not native to South America, camels (both dromedary and Bactrian) belong to the same family (Camelidae) as llamas, thus making them distant relatives.
These relationships highlight the diversity within the camelid family and the close connections among these species.
Summary
LangChain's MongoDBAtlasVectorSearch
provides an extensive API and we have only tested a very small part. However, the results so far appear promising. Further tests are needed to determine the complete level of compatibility between MongoDBAtlasVectorSearch
and Kai.
Top comments (0)