As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!
Python Search Engine Implementation Techniques
Search functionality forms the backbone of modern applications. I've spent years implementing search solutions, and I'll share seven powerful Python techniques that deliver exceptional performance and accuracy.
Text Indexing with Elasticsearch-DSL
Elasticsearch-DSL provides a high-level interface for building search applications. The library offers powerful abstractions for document management and complex queries. Here's how I implement a basic search system:
from elasticsearch_dsl import Document, Text, Keyword, Date, Search
from elasticsearch_dsl.connections import connections
# Configure connection
connections.create_connection(hosts=['localhost'])
class Product(Document):
name = Text(fields={'raw': Keyword()})
description = Text()
tags = Keyword(multi=True)
created_at = Date()
class Index:
name = 'products'
settings = {
'number_of_shards': 2,
'number_of_replicas': 1
}
def search_products(query, filters=None):
s = Search(index='products')
q = Q('multi_match', query=query, fields=['name^3', 'description'])
if filters:
s = s.filter('terms', tags=filters)
s = s.query(q)
response = s.execute()
return response.hits
Whoosh: Pure Python Search Engine
Whoosh offers a pure Python solution for search functionality. I find it particularly useful for smaller applications where running separate search services isn't feasible:
from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, ID
from whoosh.qparser import QueryParser
import os
schema = Schema(
title=TEXT(stored=True),
content=TEXT,
path=ID(stored=True)
)
if not os.path.exists("index"):
os.mkdir("index")
ix = create_in("index", schema)
writer = ix.writer()
def add_document(title, content, path):
writer.add_document(title=title, content=content, path=path)
writer.commit()
def search_documents(query_str):
with ix.searcher() as searcher:
query = QueryParser("content", ix.schema).parse(query_str)
results = searcher.search(query)
return [(result['title'], result['path']) for result in results]
MeiliSearch Integration
MeiliSearch provides real-time search with typo tolerance. Here's my implementation approach:
from meilisearch import Client
import asyncio
client = Client('http://localhost:7700')
async def setup_search_index():
index = client.index('products')
await index.update_settings({
'rankingRules': [
'words',
'typo',
'proximity',
'attribute',
'exactness'
],
'searchableAttributes': [
'name',
'description'
]
})
async def search_products(query):
results = await client.index('products').search(query, {
'limit': 20,
'attributesToRetrieve': ['name', 'description'],
'attributesToHighlight': ['name']
})
return results['hits']
RediSearch Implementation
RediSearch combines the speed of Redis with full-text search capabilities:
from redis import Redis
from redisearch import Client, TextField, NumericField, Query
r = Redis(host='localhost', port=6379)
client = Client('products-idx', r)
def create_index():
client.create_index([
TextField('name', weight=5.0),
TextField('description'),
NumericField('price')
])
def add_product(product_id, name, description, price):
client.add_document(
f'product:{product_id}',
name=name,
description=description,
price=price
)
def search_products(query_string, min_price=0, max_price=float('inf')):
q = Query(f'@name|description:{query_string} @price:[{min_price} {max_price}]')
results = client.search(q)
return [doc.__dict__ for doc in results.docs]
Tantivy Search Engine Bindings
Tantivy brings Rust's performance to Python search implementations:
from tantivy import Schema, Index, Document
from tantivy.fields import TEXT, STORED
schema = Schema({
'title': TEXT | STORED,
'body': TEXT
})
index = Index.create('index_path', schema)
def index_document(title, body):
writer = index.writer()
doc = Document(title=title, body=body)
writer.add_document(doc)
writer.commit()
def search_documents(query_str):
searcher = index.searcher()
query = index.query_parser().parse(query_str)
top_docs = searcher.search(query, 10)
return [(hit.doc.get('title'), hit.score) for hit in top_docs]
FAISS Vector Search
FAISS excels at similarity search and vector indexing:
import faiss
import numpy as np
class VectorSearchEngine:
def __init__(self, dimension):
self.dimension = dimension
self.index = faiss.IndexFlatL2(dimension)
self.id_map = {}
def add_vectors(self, vectors, ids):
assert vectors.shape[1] == self.dimension
self.index.add(vectors.astype('float32'))
for i, id in enumerate(ids):
self.id_map[i] = id
def search(self, query_vector, k=5):
query_vector = query_vector.reshape(1, self.dimension).astype('float32')
distances, indices = self.index.search(query_vector, k)
return [(self.id_map[i], dist) for i, dist in zip(indices[0], distances[0])]
Approximate Nearest Neighbor Search with Annoy
Annoy provides efficient approximate nearest neighbor search:
from annoy import AnnoyIndex
import random
class AnnoySearchEngine:
def __init__(self, dimension, metric='angular'):
self.dimension = dimension
self.index = AnnoyIndex(dimension, metric)
self.items = {}
def add_item(self, id, vector):
self.index.add_item(id, vector)
self.items[id] = vector
def build(self, n_trees=10):
self.index.build(n_trees)
def save(self, file_path):
self.index.save(file_path)
def load(self, file_path):
self.index.load(file_path)
def get_nearest_neighbors(self, vector, n=10):
return self.index.get_nns_by_vector(vector, n)
These implementations form a comprehensive toolkit for building search functionality. I optimize each solution based on specific requirements like data size, query complexity, and performance needs.
For large-scale applications, I combine multiple techniques. For example, using Elasticsearch for text search while implementing FAISS for vector similarity search. This hybrid approach provides both speed and accuracy.
Regular maintenance and monitoring ensure optimal performance. I implement logging, performance metrics, and automated index updates to maintain search quality over time.
The choice of technique depends on factors like data volume, update frequency, and search complexity. I recommend starting with simpler solutions and scaling up as needed, always keeping resource constraints and performance requirements in mind.
These implementations have served well in various projects, from document management systems to e-commerce platforms. The key is understanding each technique's strengths and applying them appropriately to meet specific use cases.
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva
Top comments (0)