Part 1 covered PostgreSQL with pgvector setup, and Part 2 implemented vector search using OpenAI embeddings. This final part demonstrates how to run vector search locally using Ollama! ✨
Contents
- Why Ollama?
- Setting Up Ollama with Docker
- Database Updates
- Implementation
- Search Queries
- Performance Tips
- Troubleshooting
- OpenAI vs. Ollama
- Wrap Up
Why Ollama? 🦙
Ollama allows you to run AI models locally with:
- Offline operation for better data privacy
- No API costs
- Fast response times
We'll use the nomic-embed-text
model in Ollama, which creates 768-dimensional vectors (compared to OpenAI's 1536 dimensions).
Setting Up Ollama with Docker 🐳
To add Ollama to your Docker setup, add this service to compose.yml
:
services:
db:
# ... (existing db service)
ollama:
image: ollama/ollama
container_name: ollama-service
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
data_loader:
# ... (existing data_loader service)
environment:
- OLLAMA_HOST=ollama
depends_on:
- db
- ollama
volumes:
pgdata:
ollama_data:
Then, start the services and pull the model:
docker compose up -d
# Pull the embedding model
docker compose exec ollama ollama pull nomic-embed-text
# Test embedding generation
curl http://localhost:11434/api/embed -d '{
"model": "nomic-embed-text",
"input": "Hello World"
}'
Database Updates 🐘
Update the database to store Ollama embeddings:
-- Connect to the database
docker compose exec db psql -U postgres -d example_db
-- Add a column for Ollama embeddings
ALTER TABLE items
ADD COLUMN embedding_ollama vector(768);
For fresh installations, update postgres/schema.sql
:
CREATE TABLE items (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
item_data JSONB,
embedding vector(1536), # OpenAI
embedding_ollama vector(768) # Ollama
);
Implementation 👾
Update requirements.txt
to install the Ollama Python library:
ollama==0.3.3
Here’s an example update for load_data.py
to add Ollama embeddings:
import ollama # New import
def get_embedding_ollama(text: str):
"""Generate embedding using Ollama API"""
response = ollama.embed(
model='nomic-embed-text',
input=text
)
return response["embeddings"][0]
def load_books_to_db():
"""Load books with embeddings into PostgreSQL"""
books = fetch_books()
for book in books:
description = (
f"Book titled '{book['title']}' by {', '.join(book['authors'])}. "
f"Published in {book['first_publish_year']}. "
f"This is a book about {book['subject']}."
)
# Generate embeddings with both OpenAI and Ollama
embedding = get_embedding(description) # OpenAI
embedding_ollama = get_embedding_ollama(description) # Ollama
# Store in the database
store_book(book["title"], json.dumps(book), embedding, embedding_ollama)
Note that this is a simplified version for clarity. Full source code is here.
As you can see, the Ollama API structure is similar to OpenAI’s!
Search Queries 🔍
Search query to retrieve similar items using Ollama embeddings:
-- View first 5 dimensions of an embedding
SELECT
name,
(replace(replace(embedding_ollama::text, '[', '{'), ']', '}')::float[])[1:5] as first_dimensions
FROM items;
-- Search for books about web development:
WITH web_book AS (
SELECT embedding_ollama FROM items WHERE name LIKE '%Web%' LIMIT 1
)
SELECT
item_data->>'title' as title,
item_data->>'authors' as authors,
embedding_ollama <=> (SELECT embedding_ollama FROM web_book) as similarity
FROM items
ORDER BY similarity
LIMIT 3;
Performance Tips 📊
Add an Index
CREATE INDEX ON items
USING ivfflat (embedding_ollama vector_cosine_ops)
WITH (lists = 100);
Resource Requirements
- RAM: ~2GB for the model
- First query: Expect slight delay for model loading
- Subsequent queries: ~50ms response time
GPU Support
If processing large datasets, GPU support can greatly speed up embedding generation. For details, refer to the Ollama Docker image.
Troubleshooting 🔧
Connection Refused Error
The Ollama library needs to know where to find the Ollama service. Set the OLLAMA_HOST
environment variable in data_loader
service:
data_loader:
environment:
- OLLAMA_HOST=ollama
Model Not Found Error
Pull the model manually:
docker compose exec ollama ollama pull nomic-embed-text
Alternatively, you can add a script to automatically pull the model within your Python code using the ollama.pull() function.
High Memory Usage
- Restart Ollama service
- Consider using a smaller model
OpenAI vs. Ollama ⚖️
Feature | OpenAI | Ollama |
---|---|---|
Vector Dimensions | 1536 | 768 |
Privacy | Requires API calls | Fully local |
Cost | Pay per API call | Free |
Speed | Network dependent | ~50ms/query |
Setup | API key needed | Docker only |
Wrap Up 🌯
This tutorial covered only how to set up a local vector search with Ollama. Real-world applications often include additional features like:
- Query optimization and preprocessing
- Hybrid search (combining with full-text search)
- Integration with web interfaces
- Security and performance considerations
The full source code, including a simple API built with FastAPI, is available on GitHub. PRs and feedback are welcome!
Resources:
Questions or feedback? Leave a comment below! 💬
Top comments (0)