Create account

DEV Community

Rafael Milewski

Posted on Nov 10, 2024

DearBook: Create Magical Illustrated Children's Stories with AI

#devchallenge #pgaichallenge #database #ai

This is a submission for the Open Source AI Challenge with pgai and Ollama

What I Built

I built an AI-powered children's book generator that creates fully illustrated stories based on user input. The user provides a brief prompt outlining the desired storyline, such as:

Create a fun story about a bird who was afraid to fly.

In response, an entire illustrated book is generated!

Users can also explore and read stories created by others. All stories are public and anonymous.

Demo

DearBook

Create Magical Illustrated Children's Stories with AI

This is the main repository for my submission to The Open Source AI Challenge.

The project is organized into three folders:

Backend: Contains the API, Queue, Database, ComfyUI, and Ollama.
Frontend: The UI that communicates with the API.
Infrastructure: The Terraform and stack files used to deploy the application on a Docker Swarm cluster.

Each subfolder includes instructions for running the project locally. Setup is straightforward, as everything has been containerized, running docker compose up is all that’s needed.

Warning

You need a good NVIDIA GPU to run this project!!.

For a more detailed overview, including screenshots, you can read the submission sent to the challenge here:

https://dev.to/milewski/dearbook-create-magical-illustrated-childrens-stories-with-ai-4mpe

Winner announcement:

https://dev.to/devteam/congrats-to-the-winners-of-the-open-source-ai-challenge-with-pgai-and-ollama-46b6

View on GitHub

Tools Used

TimescaleDB: The self-hosted Docker version was used.

pgvector: It was used to store the embedding of the storyline of the book, alongside other important context useful for the search. (source)

pgvectorscale: The StreamingDiskANN index type was used on the book's searchable embedding to speed up the process. (source)

pgai: The ai.ollama_embed() was used to generate embeddings of user input content during the search function. (source)

Ollama:

It was used for creating embeddings, generating the story of the book, analyzing whether the user input content is safe for children, and generating an idea prompt for an image generation model.

There were mainly two models used:

llama3.1:8b: for text generation.
mxbai-embed-large: for embedding.

ComfyUI:

It was used to generate the logo, book cover and every page based on the prompt generated by Ollama.

The main base model utilized was juggernaut-xl, alongside a few LoRAs. They can all be found in the project workflow file.

Stack

Overview of other technologies used:

PHP / Laravel / FrankenPHP / WebSocket
VueJs / Typescript / Tailwind
Docker / Docker Swarm / Traefik / Redis
Terraform / Vultr GPU Cloud

Technical Description

The process for the book generation is as following:

The user creates a prompt, which is optional. If left blank, Ollama generates a new story independently.
A preliminary check is performed to ensure the prompt does not contain any violent or inappropriate content for children. If such content is found, the generation process is aborted immediately. (prompt).
If the user provides input, the 10 records most similar to the user’s prompt are retrieved based on their embeddings. If no input is provided, the top 10 closest embeddings to each other are retrieved. This is done to prevent the LLM from generating duplicate content. In previous tests, it often produced stories about Max and the paintbrush. With this control, if a story about a paintbrush is generated, it ensures that it is at least not related to Max.
The LLM is then instructed to create a story with at least 10 paragraphs, ensuring that it doesn’t closely resemble any of the top 10 stories already in the database. (prompt)
Once the main story is created, each paragraph on its own doesn’t provide enough context for an image generation model to maintain consistency across pages. To address this, a new prompt is given to the LLM to generate context-rich descriptions for each paragraph, including details about the story, main characters, gender, and other relevant information. (prompt)
These prompts are then sent to ComfyUI, which generates the necessary assets.

Final Thoughts

I certainly learned a lot through this experience. Until now, I hadn’t had much exposure to or understanding of vector embeddings, but now it has finally clicked.

Also, knowing that with pgai I can interact with LLMs directly in SQL gives me more ideas than I have time to execute them. I had also never tried to configure a server with NVIDIA GPUs, and now I understand many of the challenges involved. My current setup is a Docker Swarm cluster with three nodes: one for ComfyUI, Ollama, and CPU-based apps. Getting NVIDIA to run on Swarm was troublesome, but it was a valuable learning experience.

I intend to keep this demo app running until the end of the challenge, and after that, anyone curious to see it can host it on their own computer. All the Docker files and instructions on how to run it are in the github repository.

Prizes qualifications

I believe my submission qualifies for the following additional prize categories:

Open-source Models from Ollama: I used llama3.1:8b and mxbai-embed-large
All the Extensions: I used pgvector, pgai and pgvectorscale.
~~Vectorizer Vibe: I could not use pgai Vectorizer because it was not implemented for Ollama. So I rolled my own queue solution.~~