yukaty

Posted on Nov 5, 2024 • Edited on Nov 12, 2024

Part 1: Setup with PostgreSQL and pgvector

#postgres #docker #ai #tutorial

Ever wondered how Netflix suggests movies you might like, or how Spotify creates personalized playlists? These AI-powered features often use vector similarity search under the hood. In this series, we'll build our own AI search engine using PostgreSQL with pgvector!

Let's get started...🐢

Project Overview
What is Vector Search?
Step-by-Step Setup
Troubleshooting Tips
Quick Preview
What's Next?

Project Overview ✨

We'll build a search engine to find similar content based on meaning, not just matching keywords. This is the same type of technology behind:

GitHub Copilot's code suggestions
Spotify's song recommendations
Netflix's movie recommendations

While various tools and services support similar functionality, we'll use pgvector to implement vector similarity search within postgreSQL.

In Part 1, we'll set up the database infrastructure. In Part 2, we'll implement the search functionality using OpenAI's embeddings.

What is Vector Search? 🔎

When AI processes content (text, code, or images), it creates a special list of numbers called embedding. Think of it as a smart summary that captures the content's meaning. Similar content will have similar numbers, making it easy to find related items.

If you're not familiar with Machine Learning, don't worry! You can easily obtain these embeddings from AI APIs like OpenAI, even without deep AI knowledge.

pgvector helps us efficiently store and search these embeddings as vectors in PostgreSQL.

Step-by-Step Setup 👣

Make sure you have Docker Desktop installed on your computer.

Project Structure

vector-search/
├── compose.yml
└── postgres/
    └── schema.sql

1. Create `compose.yml`

services:
  db:
    image: pgvector/pgvector:pg17 # PostgreSQL with pgvector support
    container_name: pgvector-db
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
      POSTGRES_DB: example_db
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./postgres/schema.sql:/docker-entrypoint-initdb.d/schema.sql

volumes:
  pgdata: # Stores data outside the container to ensure persistence

2. Define Database Schema

Create postgres/schema.sql:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create sample table
CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    item_data JSONB,
    embedding vector(1536) -- vector data
);

3. Start the Database

Run Docker Compose to build and start the PostgreSQL container with pgvector.

docker compose up --build

4. Verify the Setup

Connect to PostgreSQL:

docker exec -it pgvector-db psql -U postgres -d example_db

Check if everything is set up correctly:

-- Check installed extensions
\dx

-- Check table creation
\dt

-- Check table structure
\d items

Troubleshooting Tips 🛠️

Error: Port 5432 already in use

Change the port in compose.yml to 5433 or another free port.

  ports:
    - "5433:5432"

Database not initializing properly

Remove the volume and restart.

  docker-compose down -v    # Remove existing volume
  docker-compose up --build # Start fresh

Still not sure what's wrong?

Check the container logs.

  docker compose logs db

Quick Preview 👀

Here's a quick preview of how we'll query similar items in Part 2:

-- Find items similar to a specific vector
SELECT id, name, item_data
FROM items
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 5;

Replace [0.1, 0.2, ...] with an actual vector from AI models.

What's Next? 💭

We'll dive into the following topics:

Understand what embeddings are and how they work
Generate embeddings using OpenAI
See how vector search works in practice

Stay tuned! 🚀

Spot any mistakes or have a better way? Please leave a comment below! 💬

DEV Community

Part 1: Setup with PostgreSQL and pgvector

Contents

Project Overview ✨

What is Vector Search? 🔎

Step-by-Step Setup 👣

Project Structure

1. Create `compose.yml`

2. Define Database Schema

3. Start the Database

4. Verify the Setup

Troubleshooting Tips 🛠️

Error: Port 5432 already in use

Database not initializing properly

Still not sure what's wrong?

Quick Preview 👀

What's Next? 💭

Top comments (0)

Read next

Building Your First AI CLI Tool Using OpenAI’s API

AI Meets Supply Chains: Strategic Deployment and Supplier Innovation by Shubham R. Ekatpure

Test your Docker knowledge

A Media Server on Steroids - Walkthrough

Contents

Project Overview ✨

What is Vector Search? 🔎

Step-by-Step Setup 👣

Project Structure

1. Create compose.yml

2. Define Database Schema

3. Start the Database

4. Verify the Setup

Troubleshooting Tips 🛠️

Error: Port 5432 already in use

Database not initializing properly

Still not sure what's wrong?

Quick Preview 👀

What's Next? 💭

Read next

Building Your First AI CLI Tool Using OpenAI’s API

AI Meets Supply Chains: Strategic Deployment and Supplier Innovation by Shubham R. Ekatpure

Test your Docker knowledge

A Media Server on Steroids - Walkthrough

1. Create `compose.yml`