Understanding Search Scores in MongoDB Hybrid Search

#mongodb #vectordatabase #hybridsearch #ai

Over the past few weeks, I've been diving deep into MongoDB's hybrid search capabilities, specifically focusing on understanding how to improve search result relevancy. I discovered that understanding and optimizing search scores was crucial for delivering better results to our users. This led me to explore how MongoDB handles scoring in both traditional text search and vector search, and how these scores can be effectively combined.

If you're working with hybrid search in MongoDB, you might be interested in my previous posts about implementing semantic search (https://dev.to/shannonlal/implementing-complex-semantic-search-with-mongodb-51ib) and optimizing search with boost and bury (https://dev.to/shannonlal/understanding-mongodb-atlas-search-scoring-for-better-search-results-1in4). Today, I'll share insights about accessing and interpreting search scores in MongoDB's hybrid search implementation.

A Simple Hybrid Search Implementation

Here's a simplified MongoDB aggregation pipeline that demonstrates how to capture both vector and text search scores:


[
    {
      $vectorSearch: {
        index: 'ai_image_description_vector_index',
        path: 'descriptionValues',
        queryVector: embedding,
        numCandidates: limit,
        limit: limit,
        filter: {
          userId: userId,
          deleted: false
        }
      }
    },
    {
      $project: {
        description: 1,
        name: 1,
        searchType: 'vector',
        vectorScore: { $meta: 'vectorSearchScore' }
      }
    },
    {
      $unionWith: {
        coll: 'ai_generated_image',
        pipeline: [
          {
            $search: {
              index: 'ai_image_description',
              compound: {
                must: [
                  {
                    autocomplete: {
                      query: query,
                      path: 'description'
                    }
                  }
                ],
                filter: [
                  {
                    equals: {
                      path: 'deleted',
                      value: false
                    }
                  },
                  {
                    text: {
                      path: 'userId',
                      query: userId
                    }
                  }
                ]
              },
              scoreDetails: true
            }
          },
          {
            $addFields: {
              searchType: 'text',
              textScore: { $meta: 'searchScore' },
              textScoreDetails: { $meta: 'searchScoreDetails' }
            }
          }
        ]
      }
    },
    {
      $group: {
        _id: null,
        docs: { $push: '$$ROOT' }
      }
    },
    {
      $unwind: {
        path: '$docs',
        includeArrayIndex: 'rank'
      }
    },
    {
      $group: {
        _id: '$docs._id',
        description: { $first: '$docs.description' },
        name: { $first: '$docs.name' },
        vector_score: { $max: '$docs.vectorScore' },
        text_score: { $max: '$docs.textScore' },
        text_score_details: { $max: '$docs.textScoreDetails' },
        searchType: { $first: '$docs.searchType' }
      }
    },
    {
      $skip: cursor ? parseInt(cursor) : 0
    },
    {
      $limit: limit
    }
]

Understanding $unionWith in Hybrid Search

The $unionWith operation plays a crucial role in implementing hybrid search by executing two completely independent searches and combining their results into a single output. During my testing, I observed an interesting pattern: the initial vector search returned 8 documents, and when combined with the text search results through $unionWith, the total grew to 12 documents. This increase occurred because some documents matched both search criteria and appeared twice in the combined results. However, the subsequent grouping stages efficiently handled these duplicates by merging documents with the same ID while preserving both their vector and text search scores. This approach provides a clean way to leverage both search methods' strengths while ensuring users receive a deduplicated, comprehensive result set.

Accessing Search Scores

Vector Search Scores
To capture vector similarity scores, add a field using the vectorSearchScore metadata:

vectorScore: { $meta: 'vectorSearchScore' }

This score represents the similarity between your query vector and the document vectors (using cosine similarity or dot product).

Text Search Scores

Accessing text search scores in MongoDB requires a two-step approach. First, you need to enable scoreDetails in your search query, which unlocks detailed scoring information. Then, you can capture both the basic search score and the detailed scoring breakdown using MongoDB's meta operators:

          {
            $addFields: {
              searchType: 'text',
              textScore: { $meta: 'searchScore' },
              textScoreDetails: { $meta: 'searchScoreDetails' }
            }
          }

The basic score provides a quick way to understand document relevance, while the scoreDetails offer deep insights into how that score was calculated. These details include factors like term frequency (how often the search term appears), field weights (the importance of different fields), and any applied boost factors.

Working with search scores in MongoDB presents some interesting challenges, particularly when dealing with different score ranges between vector and text searches. However, MongoDB's detailed scoring information, combined with the $unionWith operation, provides powerful tools for implementing sophisticated ranking strategies. By understanding both the final score and its components, you can make more informed decisions about balancing search results in your hybrid implementation.

Later this week, I'll be sharing a detailed look at implementing Reciprocal Rank Fusion with MongoDB hybrid search, which offers an elegant solution for combining and ranking results from different search methods. If you're working with MongoDB search and have questions about search scores or hybrid search implementation, feel free to reach out in the comments or connect with me directly.

Stay tuned for more insights into optimizing MongoDB search functionality!

DEV Community

Understanding Search Scores in MongoDB Hybrid Search

A Simple Hybrid Search Implementation

Understanding $unionWith in Hybrid Search

Accessing Search Scores

Text Search Scores

Top comments (0)

Read next

ChromaDB for the SQL Mind

AI Breakthrough: Evolution-Based System Creates More Efficient Neural Networks

CRM Implementation: The Rise of AI and Its Transformational Impact

Deus in Machina: Pinging Jesus in the Digital Confessional