Aarav Joshi

Posted on Feb 26

High-Performance Search Engines in Python: 7 Battle-Tested Implementation Methods

#programming #devto #python #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Python Search Engine Implementation Techniques

Search functionality forms the backbone of modern applications. I've spent years implementing search solutions, and I'll share seven powerful Python techniques that deliver exceptional performance and accuracy.

Text Indexing with Elasticsearch-DSL

Elasticsearch-DSL provides a high-level interface for building search applications. The library offers powerful abstractions for document management and complex queries. Here's how I implement a basic search system:

from elasticsearch_dsl import Document, Text, Keyword, Date, Search
from elasticsearch_dsl.connections import connections

# Configure connection
connections.create_connection(hosts=['localhost'])

class Product(Document):
    name = Text(fields={'raw': Keyword()})
    description = Text()
    tags = Keyword(multi=True)
    created_at = Date()

    class Index:
        name = 'products'
        settings = {
            'number_of_shards': 2,
            'number_of_replicas': 1
        }

def search_products(query, filters=None):
    s = Search(index='products')
    q = Q('multi_match', query=query, fields=['name^3', 'description'])

    if filters:
        s = s.filter('terms', tags=filters)

    s = s.query(q)
    response = s.execute()
    return response.hits

Whoosh: Pure Python Search Engine

Whoosh offers a pure Python solution for search functionality. I find it particularly useful for smaller applications where running separate search services isn't feasible:

from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, ID
from whoosh.qparser import QueryParser
import os

schema = Schema(
    title=TEXT(stored=True),
    content=TEXT,
    path=ID(stored=True)
)

if not os.path.exists("index"):
    os.mkdir("index")

ix = create_in("index", schema)
writer = ix.writer()

def add_document(title, content, path):
    writer.add_document(title=title, content=content, path=path)
    writer.commit()

def search_documents(query_str):
    with ix.searcher() as searcher:
        query = QueryParser("content", ix.schema).parse(query_str)
        results = searcher.search(query)
        return [(result['title'], result['path']) for result in results]

MeiliSearch Integration

MeiliSearch provides real-time search with typo tolerance. Here's my implementation approach:

from meilisearch import Client
import asyncio

client = Client('http://localhost:7700')

async def setup_search_index():
    index = client.index('products')
    await index.update_settings({
        'rankingRules': [
            'words',
            'typo',
            'proximity',
            'attribute',
            'exactness'
        ],
        'searchableAttributes': [
            'name',
            'description'
        ]
    })

async def search_products(query):
    results = await client.index('products').search(query, {
        'limit': 20,
        'attributesToRetrieve': ['name', 'description'],
        'attributesToHighlight': ['name']
    })
    return results['hits']

RediSearch Implementation

RediSearch combines the speed of Redis with full-text search capabilities:

from redis import Redis
from redisearch import Client, TextField, NumericField, Query

r = Redis(host='localhost', port=6379)
client = Client('products-idx', r)

def create_index():
    client.create_index([
        TextField('name', weight=5.0),
        TextField('description'),
        NumericField('price')
    ])

def add_product(product_id, name, description, price):
    client.add_document(
        f'product:{product_id}',
        name=name,
        description=description,
        price=price
    )

def search_products(query_string, min_price=0, max_price=float('inf')):
    q = Query(f'@name|description:{query_string} @price:[{min_price} {max_price}]')
    results = client.search(q)
    return [doc.__dict__ for doc in results.docs]

Tantivy Search Engine Bindings

Tantivy brings Rust's performance to Python search implementations:

from tantivy import Schema, Index, Document
from tantivy.fields import TEXT, STORED

schema = Schema({
    'title': TEXT | STORED,
    'body': TEXT
})

index = Index.create('index_path', schema)

def index_document(title, body):
    writer = index.writer()
    doc = Document(title=title, body=body)
    writer.add_document(doc)
    writer.commit()

def search_documents(query_str):
    searcher = index.searcher()
    query = index.query_parser().parse(query_str)
    top_docs = searcher.search(query, 10)
    return [(hit.doc.get('title'), hit.score) for hit in top_docs]

FAISS Vector Search

FAISS excels at similarity search and vector indexing:

import faiss
import numpy as np

class VectorSearchEngine:
    def __init__(self, dimension):
        self.dimension = dimension
        self.index = faiss.IndexFlatL2(dimension)
        self.id_map = {}

    def add_vectors(self, vectors, ids):
        assert vectors.shape[1] == self.dimension
        self.index.add(vectors.astype('float32'))
        for i, id in enumerate(ids):
            self.id_map[i] = id

    def search(self, query_vector, k=5):
        query_vector = query_vector.reshape(1, self.dimension).astype('float32')
        distances, indices = self.index.search(query_vector, k)
        return [(self.id_map[i], dist) for i, dist in zip(indices[0], distances[0])]

Approximate Nearest Neighbor Search with Annoy

Annoy provides efficient approximate nearest neighbor search:

from annoy import AnnoyIndex
import random

class AnnoySearchEngine:
    def __init__(self, dimension, metric='angular'):
        self.dimension = dimension
        self.index = AnnoyIndex(dimension, metric)
        self.items = {}

    def add_item(self, id, vector):
        self.index.add_item(id, vector)
        self.items[id] = vector

    def build(self, n_trees=10):
        self.index.build(n_trees)

    def save(self, file_path):
        self.index.save(file_path)

    def load(self, file_path):
        self.index.load(file_path)

    def get_nearest_neighbors(self, vector, n=10):
        return self.index.get_nns_by_vector(vector, n)

These implementations form a comprehensive toolkit for building search functionality. I optimize each solution based on specific requirements like data size, query complexity, and performance needs.

For large-scale applications, I combine multiple techniques. For example, using Elasticsearch for text search while implementing FAISS for vector similarity search. This hybrid approach provides both speed and accuracy.

Regular maintenance and monitoring ensure optimal performance. I implement logging, performance metrics, and automated index updates to maintain search quality over time.

The choice of technique depends on factors like data volume, update frequency, and search complexity. I recommend starting with simpler solutions and scaling up as needed, always keeping resource constraints and performance requirements in mind.

These implementations have served well in various projects, from document management systems to e-commerce platforms. The key is understanding each technique's strengths and applying them appropriately to meet specific use cases.

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

Our Creations

Be sure to check out our creations:

We are on Medium

DEV Community

High-Performance Search Engines in Python: 7 Battle-Tested Implementation Methods

101 Books

Our Creations

We are on Medium

Top comments (0)

Read next

Android WebView Crash: Fix "Operation not permitted"

Passing variables from the static page to the widget

How To Increase Performance of Web APIs in .NET

🚀 React Patterns: Essential Tips and Tricks for Developers