With AI, our app’s quality is becoming less predictable than ever. In this post, we’re going to reduce the frameworks’ contribution to the chaos.
ℹ️ This post is part of the “Crew AI Caveats” series, which I create to fill in the gaps left by official courses and to help you master CrewAI faster and easier.
We choose CrewAI for its unique prompt crafting and some hard-coded logic. These features make it stand out, but their opacity is also a profound source of frustration and misconceptions.
When the agent produces incorrect or even nonsensical conclusions, it’s not always clear whether it’s an LLM hallucination, missing context, or a bug in the prompt-building code (all three are real.) These cases assume very different mitigation actions, so we need to know for sure what’s actually happening inside the agent.
Another challenge is designing your agent crew. Questions like how data from one task will transfer to another, whether tasks will share the same chat history, and similar issues are all confusing. Documentation is also vague on these topics, so we need an X-ray vision to see the reality.
verbose=True
Verbose mode is insufficient in CrewAI, but there’s a superior method—monitoring (aka telemetry, aka observability.) For experienced developers, the need for observability is obvious. For newcomers, CrewAI might become your first experience with this kind of tool.
NB: In CrewAI, the word “telemetry” is reserved for the data they collect from your app. Although I normally prefer that word, today I’m going to call it “monitoring” as I’m talking about the data you collect from your app.
Let’s start. I won’t dig here into how to connect telemetry—just check the relevant sections on the CrewAI website. All these one-page guides are simple and straightforward.
Instead, I’ll show you how to use it. I started with AgentOps, simply because it was the first on the list. Although I'm disappointed and will try the alternatives, it is still a fine option to start with.
Here’s a screenshot of my session. When I run crewai run
, I get a link at the beginning and end of the session.
Clicking the link opens a screen where chats are rendered per agent and separately per LLM call.
Here's my take on its UI. Don’t rely on the rendered chats—they are misleading. Instead, hover over a block in the LLM and tools diagram of the "Session Replay" tab, click to freeze the selection, and navigate to the “Raw JSON” tab.
This is the true representation of the request sent to the LLM, revealing how CrewAI actually works under the hood in your particular case. Isn't that awesome to access this secret?
For example, by pressing [Cmd+F]
, I can search for expected strings from previous tasks. This allows me to verify how tools and task data are being fed into the LLM. See how they are wrapped into the prompts and does their formatting conflict with other parts of the prompt.
You've got the point: with AI agents, monitoring is not for production only. It’s eye-opening to see the actual source of unexpected LLM responses.
You should enable monitoring asap even locally, to make debugging straightforward. Fortunately, it is simple.
Stay tuned
In the next post: Decorators in CrewAI frustrate the beginners. I am going to smooth things over on your learning journey.
Top comments (0)