Have you ever wondered how Netflix suggests movies you might like, or how Spotify creates personalized playlists? These features often use something called "vector similarity search" - a powerful way to find related content. In this guide, we'll set up a PostgreSQL database with pgvector
extension using Docker to build similar features.
Contents
What is Vector Search?
When AI analyzes content (text, images, or products), it creates a special list of numbers (called a "vector" or "embedding") that represents that item's characteristics. Similar items will have similar numbers. pgvector
helps us store and search these numbers efficiently.
If you're not familiar with Machine Learning, don't worry! You can easily obtain these embeddings from popular AI APIs like OpenAI's API, even without deep AI knowledge. These embeddings are the building blocks for creating recommendation engines and similarity search features.
Let's get started! 🚀
Prerequisites
Make sure you have Docker Desktop installed on your computer.
Step-by-Step Setup
1. Create docker-compose.yml
Create a docker-compose.yml
file in your project root to define the PostgreSQL container.
services:
db:
image: pgvector/pgvector:pg17 # PostgreSQL with pgvector support
container_name: pgvector-db
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: example_db
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
- ./postgres/schema.sql:/docker-entrypoint-initdb.d/schema.sql
volumes:
pgdata: # Stores data outside the container to ensure persistence
2. Define Database Schema (schema.sql
)
Create the postgres
directory in the project root, and then create a schema.sql
file to define your initial schema. This example schema enables pgvector extension and creates a table for storing items with vector embeddings.
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create example table
CREATE TABLE items (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
metadata JSONB,
embedding vector(1536) -- vector data
);
3. Start Docker Compose
Run Docker Compose to build and start the PostgreSQL container with pgvector.
docker compose up --build
4. Verify the Database and Extensions
Once the container is running, connect to PostgreSQL to verify the setup.
docker exec -it pgvector-db psql -U postgres -d example_db
In the PostgreSQL shell, run:
-- Check installed extensions
\dx
-- Check if your table exists
\dt
Using Your Vector Database
Here's a simple example of how to find similar items:
-- Find items similar to a specific vector
SELECT id, name, metadata
FROM items
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 5;
Replace [0.1, 0.2, ...] with your actual vector from an AI service like OpenAI.
Troubleshooting
Error: Port 5432 already in use
Change the port in docker-compose.yml to 5433 or another free port.
Can't connect to database
Check if the container is up.
docker ps
Database not initializing properly
Remove the volume and restart.
docker-compose down -v # Remove existing volume
docker-compose up --build # Start fresh
No idea what's wrong
Check the container logs.
docker compose logs db
Next Steps
Now that your vector database is set up, you can:
- Generate embeddings using AI services like OpenAI
- Store your data with its embeddings
- Build search features that find similar items
Resources
Spot any mistakes or have a better way? Please leave a comment below! 🙌
Top comments (0)