DEV Community

Cover image for AI Systematic Literature Review with KawanPaper
Fahmi Noor Fiqri
Fahmi Noor Fiqri Subscriber

Posted on

AI Systematic Literature Review with KawanPaper

This is a submission for the Open Source AI Challenge with pgai and Ollama

What I Built

This is a conversational RAG app where all the RAG pipelines are entirely built in PostgreSQL procedures using PL/pgSQL!

The idea behind this app stems from my master's thesis work. I have to do systematic literature review and doing it manually is boring. So, I created this small app so I can just upload the full text paper and chat with it, create summaries, highlights, and key results. Massively streamlining the process of systematic literature review.

Of course, this app would work with any kind of data, we just need to change the system prompt a bit!šŸ˜

Key Features:

  • Summarize research papers (journal articles, conference papers, etc.)
  • Create highlights/key insights
  • Automatic processing using pgai Vectorizer
  • Chat with independent paper
  • Save multiple chat sessions

Initially I want to fully use Ollama, but pgai Vectorizer currently do not support Ollama, so I opted to use Open AI.

Demo

Demo video here

KawanPaper

KawanPaper is your go-to app for chatting mainly with research papers (journal articles, conference papers, etc.)

Features:

  • PDF upload and automatic parsing
  • Generate key insights from research papers
  • Chat with a specific paper

Setup

Make sure you have an up to date Docker instalation and then clone this repo. We will divide the installation process into 3 parts, minio setup, database migration, and launching the app.

Configuration

  • Main configuration: copy the .env.example file to .env
  • Docker compose configuration: copy the docker.env.example to docker.env

These config have a predefined values to make it easier to deploy. Note there are some env vars that we need to define:

.env

  • VITE_MINIO_ACCESS_KEY
  • VITE_MINIO_SECRET_KEY

docker.env

  • OPENAI_API_KEY

You can add your Open AI key in the docker.env and for the minio credentials, we will create one in the next step.

Minio Setup

This is a new thing for me, back in the day we canā€¦

Tools Used

  • TimescaleDB as the main database to store the documents and its embeddings
  • pgai to access Open AI services in database
  • pgvector to store document embeddings
  • pgvectorscale to create indexes on the embeddings
  • pgai Vectorizer to automatically create embeddings from the uploaded papers

Prize Categories

Vectorizer Vibe, All the Extensions

Tech Stack

  • PostgreSQL (TimescaleDB)
  • Minio
  • Remix

So little tech stack for a RAG appšŸ˜Š We can make it smaller by storing blobs in Postgres but I don't like that idea.

Conversational RAG in PL/pgSQL

In this SQL script I implemented two Postgres function to build the conversational RAG pipeline. This is the heart and soul of this app.

I got the idea from this LangChain tutorial.

CREATE FUNCTION contextualize_question(p_session_id VARCHAR(36), p_query TEXT) RETURNS TEXT

CREATE PROCEDURE chat_with_paper(p_session_id VARCHAR(36), p_chat_content TEXT)
Enter fullscreen mode Exit fullscreen mode

I never thought I would be writing LLM chain/pipeline using SQL instead of Haystack, LangChain, or LlamaIndex, but here we are!

It's crazy what pgai could bring in the future for LLM in databases.

Final Thoughts

This has been an interesting journey because the idea of running LLM directly in database is really weird at first. But after learning it for the last 2 days, I found it really interesting and could possibly revolutionize data mining pipelines for non-AI engineers. I imagine data analysts and researchers could easily get insights from database systems without major changes to existing systems.

One of my favorite experiences in this project is I learned how to write Postgres procedures and functions using PL/pgSQL. It was a really interesting journey especially to write LLM apps that used to be written using LangChain, Haystack, or LammaIndex now I implemented it using pure PL/pgSQL to build a conversational RAG.

Top comments (0)