Robin

Posted on Feb 4 • Originally published at translateabook.com

Beyond The Basic GPT-Wrapper

#webdev #ai #translation #books

Like many of you, I spend quite a bit of time on Hacker News. Like many of you, I've read a lot comments over there dismissive of "GPT-wrappers" - basic UIs around an API call to an LLM provider that promised AI magic.

Now, I try to generally be encouraging to people who are building stuff. It's scary putting yourself out there, and actually building something will make you grow a ton more than just thinking about how you could if you wanted to. But I have to admit I was internally jumping on the bandwagon here: this felt sort of cheap, like sure the end user would get value out of it but the actual engineer part had basically no added value.

This is the story of how my view evolved while building Translate a Book with AI.

A few years ago I wanted my parents to be able to read a book I really enjoyed. But the book was in English and my parents don't speak English well-enough to read an entire book in that language. So I did what any good son would do - I set out to translate the book into French.

I had an epub version and I started looking into what tools existed to translate books: I tried a few, but they were either geared at people more professional than me and were pretty complex, or they were clumsy to use and underpowered. So I did what any good son who's also a dev would do - I set out to solve the general problem and build a tool to translate any book into any language, to translate that one single book.

(And as you can already guess, I spent way more time working on the tool than actually translating the book 😅 I'll reframe this as a success, saying that's because the tool saved me so much time during translation!)

I went full serverless over-engineering on it and built a complete interface to translate books line by line while having a nice display of what both versions looked like, thesaurus integration, optional machine translation of each line, proof-reading mode... It was really fun to build, I enjoyed the translation process and making the tool evolve to fit my new needs, and I ended up with a beautiful translated book that I printed and gifted my parents.

I was happy with the tool and wanted to open it for other people to use but you know how it is - building something for myself is fun, and then there's lots of tedious work involved in actually making it usable for other people. Quick hacks that solved my problem on my local machine only, cryptic error messages that only I knew what to do with, manual data migration to avoid data loss when tweaking my editing data model... I would have to clean all that up if normal people were to use it.

So, like a true indie dev, I built a nice thing and let it sit unused on my disk for years.

Fast forward to the end of 2024: AI and LLMs are exploding in capabilities and popularity, and I'm still sometimes wanting to share English books with friends and family. My tool was great, but even with it it took me a few dozens of hours to actually finish translating a single book, and I'd have to dive back into an old codebase to use it again. Could AI✨ have become good enough to do a good job by itself?

I decided to scrap my original project and start from scratch. My 2.0 would be designed from the start to be used by other people. I'd use a tech stack that would be as simple as possible for easy maintenance (I'm using Laravel Livewire, which is a delight to work with). And I'd try to bring down the time to obtain a quality book from dozens of hours of work, to uploading a file and waiting for a few dozens of minutes while going for a walk.

I would be one of those ChatGPT wrapper, HN commenters be damned. These are supposed to be easy, right?

I started building the thing, my new stack making me really productive. There's always this rush at the start of a new coding project where everything is a blank canvas, you're not yet constrained by code you have already written, and the future is just full of possibilities. I'd try out a few patterns I'd recently read about, I'd design the data model to be flexible, I'd learn this new tool everyone's been raving about, while being a good dev and be pragmatic and resist over-engineering too much - just a little bit...

I also knew that this first rush lasts only for a while, and I'd have to use it to bridge the gap between where I was now and a state where my tool could actually do useful work. The pleasure of building needs to be taken over by the pleasure of actually using and benefiting from what I've built, rinse and repeat, otherwise motivation dies down. So I would focus on the shortest feature-path to having something actually useful.

First results were encouraging. I learned about file formats - epub are zipped HTML, and soon enough I got the Claude Sonnet 3.5 to translate HTML and return back working HTML in another language adn zip it back up. Yay!

(I knew I wanted to have a quality translation so from the start I used one of the best LLMs available I knew. It would cost a little more, but I love Claude and the whole point of trying this project now was to use something that understands nuances and brings coherence to the book)

But there were some issues too. Sometimes the result was great, and sometimes Claude just... stopped in the middle? Decided it wouldn't output HTML but markdown? Replied with unbalanced tags?

Because the LLM context window isn't infinite (and apart from the context window, LLMs have a limit on how much tokens they can output at once. So for example, at the time of writing Sonnet 3.5 has a context window of 200k tokens, but can only output 8192 tokens at once), and because books can be longer than that, I also needed to split the book into chunks.

No problem, I'll just split the text into big chunks, cut on sentence endings, and fit that into the LLM, done. Except that this means I'll feed unbalanced HTML into it, and Claude really doesn't like outputting unbalanced HTML - sometimes it worked, sometimes it didn't.

All of this points to a very frustrating nature of LLMs - sometimes they work as intended, sometimes they don't. If you want to know why, well tough luck. Try changing slightly your ~~magical incantation~~ prompt, or just retry, and you might get lucky. Or not. Who knows.

I started building a list of edge cases and bad behaviors from Claude that I would include in every prompt to guide the result. I processed my HTML chunks to replace tags with placeholders so that it wouldn't be unbalanced. I asked it to end all its replies with [JOB DONE - the whole text was translated] to help preventing early cutoff and detect when it did. Did I mention LLMs API are also often unstable, so you have to have solid retry logic in cases of timeouts, rate limits when you should not be rate limited, and the like?

I started logging all calls to the API and their responses - at first just to a log file, but it quickly became apparent this would be insufficient and I would need to be able to dive into the specific of what happened on attempt #3 for chunk #24 of that one specific book that sometimes had an edge case, so I built logging into my data model and an admin interface to drill into what was happening, which quickly ballooned (luckily there is a great admin panel builder in my stack - thanks Filament!)

All of this was really fun, but also slowly draining my "new project energy". Since I knew that would happen, I did focus on having something usable and public as soon as possible. I had a checkbox allowing people to translate for free in exchange for me sending them an email to ask for feedback afterwards - most people didn't reply, but a few started to.

A couple of authors were translating their books and were really happy with the result - it looked like this would save them both a ton of time, and they really liked the translation quality. They sent long grateful emails, with good feedback. Motivating!

One other person replied to my upbeat email with "The file doesn't work. What utter nonsense." - I realized although the file was fine, I needed to set the mimetype of the download correctly to an epub and not only rely on having .epub in the name of the file (which worked on my machine, but not theirs). I replied to that person with the fix a few hours later and they answered back with "Oh wow, this works. The file is really good. Thanks my friend!"

(we've written back and forth a few times after that and they really enjoy my tool now. I guess the learning is, "if you reply to harsh comments with professionalism, don't take things personally and be kind, you might have a great relationship"? Only when I have a emotional resources to, though!)

This stream of seeing people using the tool in the logs, and getting feedback from some of them, was really motivating and helped me get data to improve the tool, catch edge cases and see what people were trying to translate. The cycle was working.

It's been a little while now, and a ton of other details-that-actually-take-work-to-handle have come and gone. DOCX is "just XML so my HTML algo should just work" except the tag pattern is different and confuses the LLM in different ways, so I had to write a much more robust XML parsing algorithm. Sometimes having more tags in the output is a bug, sometimes it's because one word in the input language becomes two words in different places in the target language. Chunking a book makes things parallel and fast, but if chunks truly don't know about each other you have less consistency across the whole book. (I always have new ideas for that one - stay tuned!)

I switched to DeepSeek V3, that had just come out. It was way cheaper that Claude, which allowed me to have the AI start double checking its work, retry suspicious results with dynamic prompts, maintain some context between chunks. A month after I started using it it became widely know in the US, NVIDIA stock went down $600B, and they started being attacked by hackers all over the world and came offline very often. I had to try out different new providers and build more monitoring tools - their different token speeds (which vary per language) was breaking assumptions and I had to customize async behavior, chunk size, cost control and retry logic.

I have to say I didn't have "nation state bad actors" in my list of things that could disrupt my "simple GPT-wrapper"!

So, what have I learned from all this?

I think it's the usual lesson: user don't care about the technology you're using, they care about solving their problem. AI advances have made use of lots of really intelligent people and lots (and lots) of compute power to make accessible things that were science-fiction just a few years ago to a broad community. That's awesome, and I definitely wouldn't be able to build my tool without that.

But it's also rough around the edges, finicky, and getting from 90% working to 100% working can take a lot of work. This is the true added value of the "wrapper".

It comes from the many hours spent trying to understand the specific problem space better, catch edge cases, work with trade-offs, and make the final experience as easy to use for the final user as possible. It comes from making a ton of micro-decisions for that user that they will never need to know about. The added value is offloading mental and technical work from that user to yourself and your tool.

So that's what my experience showed me. Working on that last 10% to make things work for the end user and save them the headache of AI-tweaking is actually pretty darn valuable.

If you think of a tool as a "GPT-wrapper", it's probably because it makes the user interact directly with the AI. If it's more polished and the AI is abstracted away, it's a "product powered by an AI engine".

But you'll agree that's a bit of a mouthful.

So, long live the Wrapper!

(And if you want to translate a book in (almost) any language, I'd recommend checking out Translate a Book with AI. I hear they're nice people, and always happy for feedback 😉)

DEV Community

Beyond The Basic GPT-Wrapper

Top comments (0)

Read next

Building a CRUD Application with Flask and SQLAlchemy

Run a LLM Locally on an Intel Mac with an eGPU

Excited to introduce Pillar-UI, a lightweight React design system focused on accessibility and developer experience (DX)! Explore it and share your feedback! If you like it, please consider adding a star, thanks

Is DeepSeek Safe to Use? Privacy Concerns You Should Know Before Using DeepSeek AI