DEV Community

Cover image for How to Build a Vector Database with SQLite in Node.js For LLM's.
Sk
Sk

Posted on • Originally published at lubba-dub-dub-js.hashnode.dev

How to Build a Vector Database with SQLite in Node.js For LLM's.

SQLite is everywhere. From massive systems to tiny embedded devices—even your web browser and Android phone rely on it. It's lightweight, fast, and battle-tested.

Now, think about where AI is headed. Just as computers shrank from room-sized machines to smartphones—thanks to Moore’s Law—Large Language Models (LLMs) are following a similar trajectory. With breakthroughs in research, clever optimizations, and techniques like quantization (which shrinks models by reducing bit precision), these models are becoming smaller and more efficient.

When LLMs start running locally on smaller devices, they’ll need storage that’s just as fast and lightweight as they are. Enter SQLite—already a go-to for many systems.

But here’s a twist: what if we used SQLite not just as a storage engine, but also for vector computations? In AI, LLMs thrive on vectors and embeddings—numerical representations that capture the essence of text. While PostgreSQL has pgvector for vector search, we can explore how to turn SQLite into a powerful vector database.

In this article, we’ll build a vector database with SQLite in Node.js for Retrieval-Augmented Generation (RAG) applications.


RAG in Node.js

First, clone the repo that contains the data folder. This folder includes chunks from white papers—perfect for testing embeddings.

git clone https://github.com/SfundoMhlungu/nodejs-vectordb.git
Enter fullscreen mode Exit fullscreen mode

Note: The code is messy—I was just experimenting at first and didn’t plan on writing about it. But don’t worry, we’ll recreate the core parts in this article.

I suggest creating a clean folder, copying the data folder into it, and initializing a fresh package.json:

vectorDB/
│── data/
│── embed.js
│── index.js
Enter fullscreen mode Exit fullscreen mode

Seeding the Database

Make sure you have SQLite3 installed and that your Node.js version supports native modules.

pnpm add better-sqlite3 ollama uuid
Enter fullscreen mode Exit fullscreen mode

We’ll use Ollama and the mxbai-embed-large model to generate embeddings, so make sure that’s installed too.

Now, open embed.js:

import { readFileSync, readdirSync } from "fs";
import ollama from "ollama";
import Database from "better-sqlite3";
import { v4 as uuidv4 } from "uuid";

const db = new Database("embedblob.db");

db.exec(`
    CREATE TABLE IF NOT EXISTS embeddings (
        id TEXT PRIMARY KEY,
        sessid TEXT,
        name TEXT,
        content TEXT,
        embeddings BLOB
    );

    PRAGMA journal_mode = WAL;  -- Improves write performance
`);
Enter fullscreen mode Exit fullscreen mode

Nothing fancy—just setting up the database and an embeddings table. The important columns here are sessid and embeddings.

  • sessid lets us simulate a chat system. Think of how ChatGPT creates a "new chat"—documents embedded in that session aren’t accessible in others.
  • This limits the search space, improving performance.

The data folder simulates this behavior.

Now, let’s read the data:

const sessions = readdirSync("./data");
let data = {};

// Example structure:
// {
//     sess1: ["1.txt", "2.txt", "3.txt"],
//     sess2: ["1.txt", "2.txt", "3.txt"]
// }

for (const session of sessions) {
    const files = readdirSync(`./data/${session}`);
    data[session] = files;
}

for (const [session, files] of Object.entries(data)) {
    for (const file of files) {
        const content = readFileSync(`./data/${session}/${file}`, "utf-8");
        embed(content, { session, name: file });
    }
}
Enter fullscreen mode Exit fullscreen mode

It’s just reading text files. Now, let’s generate embeddings:


Generating Embeddings

/**
 * Generates embeddings for the given text.
 * @param {string} content - The text content.
 * @param {Object} meta - Metadata (session, filename).
 */
async function embed(content, meta) {
    const res = await ollama.embed({
        model: "mxbai-embed-large",
        truncate: true,
        input: content,
    });

    meta.model = res.model;
    const embeddingArray = new Float32Array(res.embeddings.flat()); // Convert to binary

    saveToDb(embeddingArray, meta, content);
}
Enter fullscreen mode Exit fullscreen mode

Our SQLite plugin expects embeddings as 32 bit binary data. In Node.js, we handle that using Float32Array, a typed array format.

Now, let’s save the embeddings to the database:


Saving to the Database

async function saveToDb(embeddings, meta, content) {
    const transaction = db.transaction(() => {
        const stmt = db.prepare(`
            INSERT INTO embeddings
            VALUES (?, ?, ?, ?, ?)
        `);

        const id = uuidv4();
        stmt.run(id, meta.session, meta.name, content, embeddings);
    });

    transaction();
}
Enter fullscreen mode Exit fullscreen mode

Running the Script

Execute:

node embed.js
Enter fullscreen mode Exit fullscreen mode

Now we have embeddings in SQLite. Next, let’s perform a vector search!

Vector Search with Cosine Similarity

Querying a vector database is pretty straightforward. If you're new to vector search, check out this guide from DataStax on cosine similarity.

The basic process looks like this:

  1. Take a user query (chat message) and generate its embedding.
  2. Find the most similar documents in the database.
  3. Embed those documents into the query as context.

That's Retrieval-Augmented Generation (RAG) in a nutshell. Not too complicated—until you start adding tools, but that's a story for another day.

Setting Up the Code

Open index.js and start with:

import ollama from 'ollama'
import Database from 'better-sqlite3';

const db = new Database('embedblob.db');

async function EmbedUserQuery(query) {
    const res = await ollama.embed({
        model: "mxbai-embed-large",
        truncate: true,
        input: query,
    });

    return new Float32Array(res.embeddings.flat()); // Convert embedding to Float32 binary
}
Enter fullscreen mode Exit fullscreen mode

Implementing Cosine Similarity

Now, let's define a function to compute cosine similarity between two vectors:

function cosineSimilarity(v1, v2) {
    v1 = new Float32Array(v1.buffer); // Convert blob back to Float32
    v2 = new Float32Array(v2.buffer);

    if (v1.length !== v2.length) {
        throw new Error("Vectors must be of the same length.");
    }

    let dot = 0, norm1Sq = 0, norm2Sq = 0;
    for (let i = 0; i < v1.length; i++) {
        dot += v1[i] * v2[i];
        norm1Sq += v1[i] * v1[i];
        norm2Sq += v2[i] * v2[i];
    }

    return dot / (Math.sqrt(norm1Sq) * Math.sqrt(norm2Sq)); // Returns a value between 0 and 1
}
Enter fullscreen mode Exit fullscreen mode

Performing the Similarity Search

Now, we need a function to:

  • Receive a user query
  • Embed it
  • Find the most similar documents in the database
  • Return a formatted query
async function CheckSimilarity(query, sess) {
    const rows = db.prepare(`SELECT * FROM embeddings WHERE sessid = ?`).all(sess);

    let matches = ``;
    if (rows.length > 0) {
        const embedding = await EmbedUserQuery(query);

        for (const row of rows) {
            const e = new Float32Array(row.embeddings.buffer); // from SQLite BLOB to float32array
            const sim = cosineSimilarity(embedding, e); // Compute similarity

            console.log(`doc: ${row.name}, similarity: ${sim}, user query: ${query}`);

            if (sim > 0.6) { // If similarity is >60%, include in matches
                matches += row.content + "\n";
            }
        }
    }

    return matches ? `context: ${matches}\n\nuser query: ${query}` : query;
}
Enter fullscreen mode Exit fullscreen mode

Chatting with the Model

Now, let’s use this to generate responses:

async function Chat(userQuery, sess) {
    const query = await CheckSimilarity(userQuery, sess);
    console.log("formatted query:", query);

    const message = { role: 'user', content: query };
    const response = await ollama.chat({ model: 'llama3.2', messages: [message], stream: true });

    for await (const part of response) {
        process.stdout.write(part.message.content);
    }
}
Enter fullscreen mode Exit fullscreen mode

I’m using llama3.2, but you can swap it out for any model you prefer.

Testing It Out

At the bottom of the file, test the model:

Chat("hello model", "sess1");  // 0.39 similarity

// Chat("file system for large distributed data-intensive applications", "sess1"); // 0.81
// Chat("advertising, marketing, paid subscriptions", "sess2");  // 0.63
Enter fullscreen mode Exit fullscreen mode

For example, this won’t match any documents:

Chat("hello model", "sess1");  // 0.39  
Enter fullscreen mode Exit fullscreen mode

But this has a document matching at 80%:

Chat("file system for large distributed data-intensive applications", "sess1");
Enter fullscreen mode Exit fullscreen mode

Since our seed data includes parts of the Google File System white paper, try playing around with different queries—comment/uncomment lines and experiment.

Performance Insights

If you’re paying close attention, you’ll notice that vector search is really fast. That’s because:

  • TypedArrays are highly optimized for holding binary data.
  • Node.js is built on C++, making operations like this super efficient.

Still don’t believe me? I built a C++ SQLite plugin that does the same thing but with multiple optimizations, including SIMD.

Native vs JavaScript Performance

JavaScript actually performs faster than the native plugin in some cases. That’s because SQLite itself is C++, and when you combine it with TypedArrays, it gets a performance boost.

However, native plugins are still useful—especially for languages like PHP, Go, C#, python etc that can consume DLLs.

That’s why I packaged this into vector-sqlite3, which includes the DLL for Windows users. Other platforms can compile the C plugin from the repo.

Using vector-sqlite3

First, install it:

npm i vector-sqlite3
Enter fullscreen mode Exit fullscreen mode

Then, update your code to use the optimized vector functions:

import ollama from 'ollama';
import Database from 'better-sqlite3';
import sqlitevector from 'vector-sqlite3';

const db = new Database('C:/path/to/your/db/embedblob.db');

// Enable native vector functions
sqlitevector(db, { useNative: ["cosine_similarity", "l2_distance", "dot_product"] });
Enter fullscreen mode Exit fullscreen mode

Or, if you want to stick with JavaScript:

sqlitevector(db, { useNative: [] }); // Disable native functions esp for non windows users since there's no build

async function EmbedUserQuery(query) {
    const res = await ollama.embed({
        model: "mxbai-embed-large",
        truncate: true,
        input: query,
    });

    return new Float32Array(res.embeddings.flat());
}
Enter fullscreen mode Exit fullscreen mode

Now, let's perform vector search using SQLite:

const f = await EmbedUserQuery("file system for large distributed data-intensive applications");

const rows = db.prepare(`
    SELECT *, cosine_similarity_unrolled(embeddings, ?) AS similarity
    FROM embeddings WHERE sessid = ?
`).all(f, "sess1");

console.log(rows[0]);
Enter fullscreen mode Exit fullscreen mode

Optimized Cosine Similarity

This function inside vector-sqlite3 speeds things up by processing four elements at a time:

cosine_similarity_unrolled: (v1, v2) => {
    if (v1.length !== v2.length) {
        throw new Error("Vectors must be of the same length.");
    }

    v1 = new Float32Array(v1.buffer);
    v2 = new Float32Array(v2.buffer);

    let dot = 0, norm1Sq = 0, norm2Sq = 0;
    let i = 0, len = v1.length;

    // Process 4 elements at a time
    for (; i <= len - 4; i += 4) {
        dot += v1[i] * v2[i] + v1[i + 1] * v2[i + 1] +
               v1[i + 2] * v2[i + 2] + v1[i + 3] * v2[i + 3];

        norm1Sq += v1[i] * v1[i] + v1[i + 1] * v1[i + 1] +
                   v1[i + 2] * v1[i + 2] + v1[i + 3] * v1[i + 3];

        norm2Sq += v2[i] * v2[i] + v2[i + 1] * v2[i + 1] +
                   v2[i + 2] * v2[i + 2] + v2[i + 3] * v2[i + 3];
    }
}
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

You can check out the repo for more details, including the DLL.

Supported functions in JavaScript:

cosine_similarity, cosine_similarity_unrolled, l2_distance, dot_product
Enter fullscreen mode Exit fullscreen mode

In the DLL, everything is already optimized, so no need for cosine_similarity_unrolled.

const rows2 = db.prepare(`
    SELECT *, l2_distance(embeddings, ?) AS distance,
    dot_product(embeddings, ?) AS dot_product
    FROM embeddings WHERE sessid = ?
`).all(f, f, "sess1");

console.log(rows2[0]);
Enter fullscreen mode Exit fullscreen mode

Top comments (2)

Collapse
 
sfundomhlungu profile image
Sk • Edited

The code for vector-sqlite3

If you found this useful you can clone and read it, and please also star us, contributions also would nice! maybe build files for other platforms

The c code is in this repo to build native files(so for linux, and one darwin)

Collapse
 
edem_jack_fb61387496ec3e7 profile image
Edem jack

That’s an interesting approach! SQLite’s lightweight nature makes it a great candidate for handling vector embeddings locally moi qatar visa check. Curious to see how performance compares to pgvector for smaller-scale AI applications.