Nebius AI Studio is an inference service offering state-of-the-art open-source large language models. The models include text, embeddings, and vision. This is the first article in a series demonstrating how to build a text-to-SQL Retrieval Augmented Generation (RAG) system.
Embeddings are an essential part of machine learning and artificial intelligence systems. They convert high-dimensional data such as text, sound, and images into numerical vectors that capture the meaning and relationships of the data.
As a first step, we will use the Nebius AI Studio API to create embeddings for a database. We can use the Northwinds Trader database, a sample database of a fictional company's transactions. This project uses Postgresql, and there's a Github repository to create the database. The Data Definition Language (DDL) in the SQL script will populate the RAG database, and we can instantiate it to test SQL generated by the LLM.
Converting SQL to Text
The first step is to create the text populating the vector database. We can use raw SQL, but providing context will improve results.
Here is the DDL for the customers
table.
CREATE TABLE customers (
customer_id character varying(5) NOT NULL,
company_name character varying(40) NOT NULL,
contact_name character varying(30),
contact_title character varying(30),
address character varying(60),
city character varying(15),
region character varying(15),
postal_code character varying(10),
country character varying(15),
phone character varying(24),
fax character varying(24)
);
We can rewrite the SQL in markdown, providing context about the table and the columns.
Table 1: DB.NORTHWIND.CUSTOMERS (Stores customer information)
This table contains customer information.
- customer_id character varying(5) [Primary Key, Not Null] - Unique identifier for customers
- company_name character varying(40) - [NOT NULL]- Company where contact works
- contact_name character varying(30) - Name of the customer
- contact_title character varying(30) - Title of the customer
- address character varying(60) - Physical address of the customer
- city character varying(15) - City where customer is located
- region character varying(15) - Region where customer is located
- postal_code character varying(10) - Customer's postal code
- country character varying(15) - Country where customer is located
- phone character varying(24) - Customer's phone number
- fax character varying(24) - Customer's fax number
There are 11 Notwind tables annotated and converted to markdown.
Creating Embeddings
We can create vectors using the Nebius embedding models with the markdown data prepped. You'll need an API key from Nebius. Select API Keys
in the menu bar and follow the instructions.
Set the API key as an environment variable in Linux/macOS:
export NEBIUS_API_KEY-"xxxxxxxxxxxxxxxx...xxxxxxxxx"
Set the API key as an environment variable in a Windows Powershell session.
$env:NEBIUS_API_KEY="xxxxxxxxxxxxxxxx...xxxxxxxxx"
Let's dive into the code.
Create the database
Assuming that PostgreSQL is running, this function creates a new database called rag
. If needed, it adds the pgvector extension and creates a table called items
.
If you use
Postgress.app
on macOS or a hosted provider, pgvector is pre-installed. See your database's installation instructions if needed.
dimensions = 4096
def database_config():
# enable extension
conn = psycopg.connect(dbname='postgres',user="postgres", autocommit=True)
conn.execute('CREATE DATABASE rag')
conn = psycopg.connect(dbname='rag',user="postgres", autocommit=True)
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
register_vector(conn)
# create table
conn.execute('DROP TABLE IF EXISTS items')
conn.execute('CREATE TABLE items (id bigserial, chunk text, embedding vector({dimensions}))')
return conn
Create vectors and database records
The embedding process takes the markdown documents and splits the text into smaller chunks. The text is passed to the embedding service to convert them into vectors. The vectors and text are combined to make a record.
API_KEY = os.environ.get('NEBIUS_API_KEY')
client = OpenAI(
base_url="https://api.studio.nebius.ai/v1/",
api_key=API_KEY,
)
def create_embeddings(files, docs_dir):
data = []
for filename in files:
file_path = os.path.join(docs_dir,filename)
with open(file_path, 'r', errors="replace") as file:
document_text = file.read()
chunks = recursive_chracter_splitter_chunking(document_text)
for chunk in chunks:
chunk = chunk.replace('\n',' ')
chunk = chunk.replace("\x00", "")
embedding = create_vectors(chunk)
pc_list = [chunk, embedding]
data.append(pc_list)
return data
def create_vectors(text):
embedding = client.embeddings.create(
model="BAAI/bge-en-icl",
input=text,
dimensions=1024,
encoding_format="float",
).data[0].embedding
return str(embedding)
Insert the embeddings
This function adds the embeddings to the items
table.
def upsert(embeddings, conn):
# insert records into the database
for record in embeddings:
chunk = str(record[0])
vector = record[1]
conn.execute('INSERT INTO items (chunk, embedding) VALUES (%s, %s)', (chunk, vector))
Typically, we could create an index to improve retrieval. However, the Nebius embedding service only returns vectors with the maximum number of tokens, e.g., 4096. Many pgvector HNSW indices support only 2000 tokens, except for bit indices. While you can create a bit quantized index, recall can be negatively impacted. I've added sample code to create the index, but given the size of the data, an index is not necessary.
# create index
# database_conn.execute('CREATE INDEX ON items USING hnsw ((binary_quantize(embedding)::bit(4096)) bit_hamming_ops);')
Next steps
This article demonstrates how to create embeddings from the Northwinds database DDL and insert them in PostgreSQL with the pgvector extension. The following article shows how to build the client to generate SQL from a prompt.
Top comments (1)
Great article on implementing vector search with PostgreSQL! For those looking to scale their vector search implementations without managing infrastructure, Astra DB also offers a really nice vector-ready non-relational solution. The approach you've outlined for text-to-SQL with embeddings is solid and could work across different vector-capable databases.