DEV Community

mehmet akar
mehmet akar

Posted on

Postgresql Vector Database: Pgvector Tutorial

"Postgresql Vector Database: Pgvector" is an open-source extension for PostgreSQL that enables vector similarity searches, supporting exact and approximate nearest neighbor search. With pgvector, you can store vector embeddings alongside your other relational data, ensuring seamless integration into your existing database workflows.


Postgresql Vector Database: Pgvector Key Features

  • Similarity Metrics: Supports L2 distance, inner product, cosine similarity, L1 distance, Hamming distance, and Jaccard distance.
  • Vector Types: Single-precision, half-precision, binary, and sparse vectors.
  • Indexes: Includes HNSW and IVFFlat for approximate nearest neighbor search.
  • PostgreSQL Features: Leverages PostgreSQL’s ACID compliance, joins, and point-in-time recovery.

Postgresql Vector Database: Pgvector Installation

On Linux and macOS

cd /tmp
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
Enter fullscreen mode Exit fullscreen mode

Alternative Methods

  • Docker: docker pull pgvector/pgvector:pg17
  • Homebrew: brew install pgvector
  • PGXN: pgxn install vector
  • APT (Ubuntu/Debian): Follow PostgreSQL APT repository instructions, then sudo apt install postgresql-17-pgvector.

On Windows

  1. Install Visual Studio C++.
  2. Build with nmake:
set "PGROOT=C:\Program Files\PostgreSQL\16"
cd %TEMP%
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
cd pgvector
nmake /F Makefile.win
nmake /F Makefile.win install
Enter fullscreen mode Exit fullscreen mode

Getting Started: Postgresql Vector Database: Pgvector

Enabling the Extension

Run the following in your PostgreSQL database:

CREATE EXTENSION vector;
Enter fullscreen mode Exit fullscreen mode

Creating Tables with Vector Columns

  1. Create a Table:
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
Enter fullscreen mode Exit fullscreen mode
  1. Add a Vector Column to an Existing Table:
ALTER TABLE items ADD COLUMN embedding vector(3);
Enter fullscreen mode Exit fullscreen mode
  1. Insert Vector Data:
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
Enter fullscreen mode Exit fullscreen mode
  1. Query Nearest Neighbors:
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Querying Vectors

Supported Distance Metrics

  • L2 Distance: <->
  • Inner Product: <#>
  • Cosine Distance: <=>
  • L1 Distance: <+>
  • Hamming Distance (binary vectors): <~>
  • Jaccard Distance (binary vectors): <%>

Examples

  1. Find Nearest Neighbors:
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Enter fullscreen mode Exit fullscreen mode
  1. Filter by Distance:
SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;
Enter fullscreen mode Exit fullscreen mode
  1. Distance Aggregates:
SELECT AVG(embedding) FROM items;
Enter fullscreen mode Exit fullscreen mode

Indexing for Performance

pgvector supports two types of indexes for approximate nearest neighbor searches: HNSW and IVFFlat.

HNSW Index

  1. Create an Index:
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
Enter fullscreen mode Exit fullscreen mode
  1. Configure Query Options:
SET hnsw.ef_search = 100;
Enter fullscreen mode Exit fullscreen mode

IVFFlat Index

  1. Create an Index:
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
Enter fullscreen mode Exit fullscreen mode
  1. Adjust Query Options:
SET ivfflat.probes = 10;
Enter fullscreen mode Exit fullscreen mode

Advanced Features

Half-Precision and Sparse Vectors

  • Half-Precision:
CREATE TABLE items (embedding halfvec(3));
Enter fullscreen mode Exit fullscreen mode
  • Sparse Vectors:
CREATE TABLE items (embedding sparsevec(5));
INSERT INTO items (embedding) VALUES ('{1:1,3:2,5:3}/5');
Enter fullscreen mode Exit fullscreen mode

Hybrid Search

Combine pgvector with PostgreSQL’s full-text search for hybrid queries:

SELECT id, content FROM items, plainto_tsquery('search term') query
WHERE textsearch @@ query
ORDER BY ts_rank_cd(textsearch, query) DESC LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Performance Optimization

Bulk Loading

Use COPY for efficient bulk loading:

COPY items (embedding) FROM STDIN WITH (FORMAT BINARY);
Enter fullscreen mode Exit fullscreen mode

Tuning Parameters

  • Increase memory for index builds:
SET maintenance_work_mem = '8GB';
Enter fullscreen mode Exit fullscreen mode
  • Adjust parallel workers for faster indexing:
SET max_parallel_maintenance_workers = 7;
Enter fullscreen mode Exit fullscreen mode

Query Analysis

Use EXPLAIN ANALYZE to debug query performance:

EXPLAIN ANALYZE SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Scaling and Monitoring

Scaling Options

  • Vertical Scaling: Increase server resources.
  • Horizontal Scaling: Use replicas or sharding with tools like Citus.

Monitoring

Monitor queries using pg_stat_statements:

CREATE EXTENSION pg_stat_statements;
SELECT query, calls, avg_time_ms FROM pg_stat_statements ORDER BY avg_time_ms DESC LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Postgresql Vector Database: Pgvector empowers PostgreSQL to handle vector similarity searches efficiently, integrating seamlessly into existing data infrastructures. With support for various vector types, advanced indexing, and PostgreSQL’s robust feature set, pgvector is a powerful choice for embedding-based applications.

Top comments (0)