AI Engineering by O'Reilly. Book review.

#ai #machinelearning #openai

I have to start off saying I am not sure how much of this essay—yeah, a long text is incoming—is an actual review of the book and how much of it is me giving my opinion about AI. I have made it a goal to read and review several AI books this year. Mind you, I'm not an AI engineer or specialist, and this is but the first book on my list. One could argue that I am in no place to opine about the book or AI itself. I guess that's a fair statement, but I have had my fair share of interactions with AI, and I'd like to think I have a couple of well-founded ideas.

The first thing to have in mind is that AI engineering is not a machine learning book, so don't expect to dive into it and get a fundamental understanding of ML if you don't have it already. Surely, you'll get a few concepts here and there, and you will get a better grasp of how things work, but a lot of things you read will leave you wondering about the real meaning and how it's used. This is not a bad thing; with AI and foundational models becoming a part of more and more products, every software engineer may need to understand AI in general, but with no deep knowledge of machine learning. ML engineering and AI engineering are, at this point, related but different things.

The book does a fantastic job of presenting concepts on how AI and foundational models work. You'll understand the rise of foundational models, sampling, model evaluation, model security, mitigating hallucinations, prompt engineering, and dataset engineering. Sampling and prompt engineering are especially helpful, as I believe those will build the base that most will use to adapt foundational models to their applications—yes, I agree "prompt engineer" is not a job position, but it is a valuable skill to have; it's just not enough to build production-ready apps. Techniques such as RAG and fine-tuning are also well explained, but there is not enough information for you to apply them.

When talking about machine learning concepts, though, the book can get a bit tedious. I have mixed feelings about it. Some parts have some fascinating theoretical concepts, while others have no real depth of knowledge, and neither does the book follow a path to build real understanding. As a non-ML engineer, I found it challenging to read ML-related content.

The author is clear on how quality assurance is a must for AI applications, or the risk is far greater than the benefits. She then proceeds to show a handful of techniques on how to evaluate models. One of them is AI as a judge, in which a model judges the output of another. The judge can be either an LLM or a smaller, more specialized model. This is certainly a fair point; judging is easier than generating. I mean, how many times have you seen people judging the work of others without being able to do the job themselves? Actually, the author mentions we should get creative to solve some of the AI shortcomings; I can't shake the feeling that every time she mentions that, the suggestion of using AI to solve AI issues is coming right next. I'm obviously not implying that this is a bad idea by itself. Still, I can't help but think this could easily turn into a snowball effect.

The book also mentions AI limitations and issues, to name a few:

1) Context is king, and this has never been truer before. The importance of detail and clarity while providing the context for the LLM is huge. You cannot have it easy, though: the longer the context, the more likely the LLM is to focus on the wrong part. Also, the LLM is better at following instructions at the beginning or end of the prompt, rather than in the middle—position bias. One prompt technique is to repeat the original instruction after the user's prompt. And I thought only children were prone to remember and choose the last item they heard!

2) Have you ever thought that the more you study, the less you know? While this can be the result of you being overwhelmed with the amount of available content, we also forget stuff. Fear not, as you are not alone—our LLM buddies are right there with you. The more tasks an LLM learns, the more prone it is to catastrophic forgetting, so its performance basically drops on earlier tasks. This might explain why I'm not so good at solving second-degree equations anymore. Maybe I'm an LLM.

3) Ok, maybe I'm not an LLM. People are already worried that we may soon run out of content to train LLMs. Yes, LLMs are consuming publicly available information faster than it's being produced. Synthetic data—data produced by AI itself—is being used more and more to train LLMs, and this is helpful if used cautiously. Synthetic data mimics real data, so the performance achieved might be superficial. A model that gets trained with synthetic data from another model might know how to give a direct answer to a query but might not know how to explain why or how. But do not think that it will admit its lack of knowledge that easily; if you ask it for an explanation, it's likely to hallucinate one. This reminds me of The Black Swan, in which it's stated that when you ask a brain hemisphere of a split-brain patient to perform an action and then ask his other hemisphere for an explanation, the patient would usually come up with a senseless reason for his action. Anyhow, there are studies that correlate the usage of synthetic data with model underperformance. I bet this would result in such a great codebase!

It's important to understand what AI brings to the table and its implications and limitations. The book does a sound explanation of that. These limitations highlight the challenges we face with AI today, which brings us to the broader discussion around the main issues of AI in software development. As many know, AI is probabilistic in nature. Have you ever seen that "your chances are low, but never zero" meme? That is the main AI motto! Anything with a non-zero probability, no matter how wrong, can be generated by AI.

One issue I have experienced with AI so far is that it tends to reach conclusions without having factual consistency or sources to back its claim up. Just the other day I was questioning ChatGPT about which tag to use in an API I was consuming, and it generated an answer without any reservations. When I questioned the sources, it apologized and said it had no specific source for the claim, but that it was based on general knowledge. Turns out its answer was wrong after all, of course! In AI's defense, I must say people also claim stuff without sources to back their claims. The difference is that I don't generally see people apologizing and admitting that. So point for AI, I guess?

To me, that's a pretty major issue. Before I actually questioned it, it made no effort to make it clear that it was but a guess. This is also when it's easier to exploit some AI biases. I have found that AI has a tendency for confirmation bias, in which it tends to agree with the user's idea—try asking it some advantages of breaking a bone! When it lacks factual consistency, you can easily swerve the AI's conclusion, a lot of the time resorting to made-up stuff as well! It's almost like a debate between two political parties, except for the swerving.

Sadly, wild LinkedIn coaches are out there to get you! By the way, they have 2 trending topics these days:

1) Good software is delivered software, and good design and architecture are meaningless. My fingers itch to dive deeper into this topic. At the risk of getting off topic, I must say that these people probably do not understand what good architecture is or have never had to deal with a rushed product as a developer. Probably both. Rushed products almost always generate way more cost in the long run than revenue in the short term.

2) AI is pretty good at coding, and it will replace a fair share of developers in the near future. The coach gets a bonus point for each of the following:

2.1) He says he has developed a production-ready software in a week or less.

2.2) By himself.

2.3) Not knowing how to code.

I'll not discuss what the future holds, but at the very least, someone with such an opinion and I have way different standards for quality. Good code demands thought and creativity. If your first thought when creating a new service or feature is to create an MVC-like structure, then you and I probably have different standards for quality as well. I'm not saying you should not use AI, but in my experience, it has been useful for boilerplate and some algorithms, mostly—please, don't ask it to design a good domain model.

"You were not able to generate good code for more complex tasks? The issue is you and your prompt," replies the coach. Except you paste a snippet of code for it to analyze, and it suggests an absurd change that clearly makes no sense at all as it breaks the whole class—believe me, this has happened.

Once, I saw such a preacher arguing with a bunch of people how developers are no longer needed. Then he proceeded to display this "fairly complex" app—his own assessment—he had built. A horrendous interface with a few buttons that did nothing more than to calculate a few things. I guess it's needless to say this was a product guy and had no software engineering background whatsoever.

What makes the whole situation worse is that people who talk like that are either deluded or want to delude you. Which one is which? Just check their profile: if they say they are AI specialists, they are deluded; if they say they own an AI product, they are out to delude you.

Good output requires good input, or as data people love to say, garbage in, garbage out. A large set of an LLM's training data is composed of coding samples. It's possible to make an LLM output better code—again, I would forget domain design here—but you yourself have to be knowledgeable on what good code is. AI needs guidance for good output, and this requires critical thinking skills. At risk of offending people, it seems this is generally a lacking skill.

Sadly, service God classes are very common these days. So you can imagine there is no lack of such examples for training AI. On more than one occasion, I have seen AI demonstrate theoretical knowledge and be unable to apply that in practice. On one such occasion, I was discussing software anti-patterns, and ChatGPT was adamant on saying that anemic domain models are problematic. When I asked it for good examples, it generated anemic models. While tring to refine the code, it generated meaningless examples.

My conclusion? We are doomed; AI will replace humans. I mean, it already behaves as humans do: full of biases, judgmental, spitting claims without backing information, losing focus the longer you talk to them, forgetting stuff, etc. /sarcasm

Ok, a useful conclusion: This is a great book to start your AI engineering roadmap—you might want to start with some basic ML books first so you find the ML parts a bit less tedious, but that's not needed. The book is dense, and it really captivates you at times. Overall, you'll get enough knowledge to be able to refine your AI applications and will have the understanding to seek more practical information sources to implement techniques such as RAG. My remarks about AI are not meant to discourage its use; I believe it has the potential to create new types of applications never seen before. AI is as useful as the user makes it. A more skilled user, with better critical thinking skills, is prone to having better results. Still, I believe it's important that people realize LLMs' limitations and that it's not that easy to fit an LLM to your use case. LLMs are not a miracle, and their output is not to be trusted as a source of truth. Obviously, neither is this review/my opinion.

DEV Community

AI Engineering by O'Reilly. Book review.

Top comments (0)

Read next

Harnessing the Power of DeepSeek on Azure with Power Automate

The Future of Agile: AI Testing Agents and Their Game-Changing Impact

25+ Little-Known Python Resources That Will Make You a Pro!

MSI RTX 5090 TRIO FurMark Benchmarking + Overclocking + Noise Testing and Comparing with RTX 3090 TI