DEV Community

Louis Dupont
Louis Dupont

Posted on

DO NOT use these LLM Metrics ⛔ And what to do instead!

In two words: Generalist LLM metrics are more of a danger than an opportunity.

  • NEVER start with them.
  • Use them only as a last resort—and even then, with strict guidelines!

So what are these vague, generic metrics?

  • Helpfulness
  • Conciseness
  • Tone
  • Personalisation
  • … and more!

But what’s so wrong with them?

These Metrics Lack Real Meaning

The biggest problem? They’re designed to evaluate an LLM in general, not a specific use case.

By definition, they apply broadly—but do they truly matter? More often than not, they have weak correlations with user satisfaction and even weaker ties to actual ROI.

And what do they really measure?

  • Conciseness? What does "concise" even mean? It depends on your use case - and your definition.
  • Helpfulness? How do you objectively assess that?

At best, these metrics provide vague direction. At worst, they create the illusion that we’re measuring something meaningful -when we’re not.

Start with the Problem, Not the Solution

In the startup world, everyone preaches this - but few apply it when developing AI.

Every metric should start with a strong "why." The best way to get this right?
👉 Do error analysis on your data.

Let real-world failures guide you to the right metrics - not the other way around.

Top comments (0)