DEV Community

Louis Dupont
Louis Dupont

Posted on

Evaluate your LLM! Ok, but what's next? 🤔

Everyone say you need to Evaluate your LLM. You just did it. Now what? 🤷‍♂️

You got a score. Great. Now, here’s the trap:

You either:

  • Trust it. ("Nice, let's ship!")
  • Chase a better one. ("Tweak some stuff and re-run!")

Both are horrible ideas.

Step 1: Stop staring at numbers.

Numbers feel scientific, but they lie all the time.

Before doing anything, look at actual examples. What’s failing?

  • Bad output? Fix the model.
  • Good output but bad score? Fix the eval.
  • Both wrong? You’ve got bigger problems.

Step 2: Solve the right problem.

If your model sucks, tweak:

  • Prompts
  • Data retrieval
  • Edge cases

If your eval sucks, rethink:

  • Your scoring function
  • What “good” even means

Step 3: Iterate like a maniac.

Change something → Run eval → Learn → Repeat.

Basically, do Error Analysis on your Evals (instead of on your LLM)!

Chasing numbers isn’t progress. Chasing the right insights is.

Top comments (0)