DEV Community

Cover image for Quick tip: Using LangChain's MongoDBAtlasVectorSearch with SingleStore Kai
Akmal Chaudhri for SingleStore

Posted on

Quick tip: Using LangChain's MongoDBAtlasVectorSearch with SingleStore Kai

Abstract

This short article explores the integration of LangChain's MongoDBAtlasVectorSearch with SingleStore Kai, a MongoDB-compatible API offered by SingleStore. While LangChain already supports SingleStore, the advent of Kai enables developers to use MongoDB-based workflows with a high-performance, scalable database system. Through a quick test, this article shows that LangChain and Kai can work together, offering the potential for building AI-powered applications.

The notebook file used in this article is available on GitHub.

Introduction

SingleStore Kai is a MongoDB-compatible API built on SingleStore's high-performance engine, designed to help developers integrate MongoDB workflows with a scale-out distributed database system. LangChain, a widely used framework for AI-powered application development, includes support for MongoDBAtlasVectorSearch, which enables advanced search and retrieval capabilities. In this article, we'll perform a quick test to evaluate how LangChain's MongoDBAtlasVectorSearch works with SingleStore Kai.

Create a SingleStore Cloud account

A previous article showed the steps to create a free SingleStore Cloud account. We'll use the Standard Tier and take the default names for the Workspace Group and Workspace. We'll also enable SingleStore Kai.

We'll store our OpenAI API Key in the secrets vault using OPENAI_API_KEY.

Import the notebook

We'll download the notebook from GitHub.

From the left navigation pane in the SingleStore cloud portal, we'll select DEVELOP > Data Studio.

In the top right of the web page, we'll select New Notebook > Import From File. We'll use the wizard to locate and import the notebook we downloaded from GitHub.

Run the notebook

We'll first create the client, database and collection, as follows:

kai_client = pymongo.MongoClient(connection_url_kai)
db = kai_client["langchain_demo"]
collection = db["langchain_docs"]
Enter fullscreen mode Exit fullscreen mode

The connection_url_kai is an environment variable that already points to the Kai instance.

Next, we'll ensure that the OpenAI API Key is available:

os.environ["OPENAI_API_KEY"] = get_secret("OPENAI_API_KEY")
Enter fullscreen mode Exit fullscreen mode

We'll now prepare some documents, using the examples from the Ollama website. We'll also set the OpenAI embedding model and determine the length of the vector embeddings.

documents = [
    "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
    "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
    "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
    "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
    "Llamas are vegetarians and have very efficient digestive systems",
    "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]

embeddings = OpenAIEmbeddings(
    model = "text-embedding-3-small"
)

dimensions = len(embeddings.embed_query(documents[0]))

docs = [Document(text) for text in documents]
Enter fullscreen mode Exit fullscreen mode

Now we'll create the vector index, as follows:

collection.create_index(
    [("embedding", "vector")],
    name = "vector_index",
    kaiIndexOptions = {
        "index_type": "AUTO",
        "metric_type": "DOT_PRODUCT",
        "dimensions": dimensions
    }
)
Enter fullscreen mode Exit fullscreen mode

Next, we'll store the documents and embeddings in SingleStore:

docsearch = MongoDBAtlasVectorSearch.from_documents(
    docs,
    embeddings,
    collection = collection,
    index_name = "vector_index"
)
Enter fullscreen mode Exit fullscreen mode

We'll now ask a question:

prompt = "What animals are llamas related to?"
docs = docsearch.similarity_search(prompt)
data = docs[0].page_content
print(data)
Enter fullscreen mode Exit fullscreen mode

Example output:

Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels
Enter fullscreen mode Exit fullscreen mode

Next, we'll use an LLM, as follows:

openai_client = OpenAI()

response = openai_client.chat.completions.create(
    model = "gpt-4o-mini",
    messages = [
        {"role": "system", "content": "You are a helpful assistant. Provide more details."},
        {"role": "user", "content": f"Using this data: {data}. Respond to this prompt: {prompt}"}
    ]
)
print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Example output:

Llamas are related to several animals within the camelid family. The closest relatives of llamas include:

1. **Vicuñas** – These are wild South American camelids that are similar in appearance to llamas but are smaller and are known for their fine wool.

2. **Alpacas** – Alpacas are domesticated camelids closely related to llamas, and they are primarily bred for their soft and luxurious fleece.

3. **Guanacos** – Guanacos are wild relatives of llamas and are also native to South America. They are similar in build and habitat preference.

4. **Camels** – Though they are not native to South America, camels (both dromedary and Bactrian) belong to the same family (Camelidae) as llamas, thus making them distant relatives.

These relationships highlight the diversity within the camelid family and the close connections among these species.
Enter fullscreen mode Exit fullscreen mode

Summary

LangChain's MongoDBAtlasVectorSearch provides an extensive API and we have only tested a very small part. However, the results so far appear promising. Further tests are needed to determine the complete level of compatibility between MongoDBAtlasVectorSearch and Kai.

Top comments (0)