Stop Hallucinating: APIs are the Missing Link For Reliable Agentic Applications

In a previous post, we discussed the challenges of building agentic applications solely with large-language models. Leaving aside the immense challenge of training large-language models on an infinite array of use cases in order to function as even a rudimentary personal assistant, we declared agentic applications built on the autonomous operation of UIs designed for humans as a wrong turn.

Time and again, we hear reliability is a chief obstacle preventing the serious implementation of LLMs and agentic applications in business systems across industries.

A tool that behaves as expected only 7 out of 10 times is useless: whether it's a hammer, a light switch, an airplane or a toilet. How do we improve the reliability of a technology whose principal value propositions are: excellence at generating things (e.g. images, text, video) and with respect to text, sounding human-like?

A Tale of Two Computing Models

Applications, as we know them, are evaluated on wholly different criteria--namely: consistency, specificity and determinism. The same input produces the same output forever. This is their competitive advantage.

LLMs and other generative tools are a different story: identical inputs can produce mildly to wildly different outputs ranging from the amusing to the outright disturbing.

We call divergence from an expected outcome a bug in a traditional application and a hallucination in an LLM. This difference is more than semantic; it reflects a fundamental conflict between operating paradigms that must be reconciled before agentic applications can succeed in mass adoption.

Below we examine a novel approach to developing agentic applications by combining the best of these two powerful computing models; this solution ably competes with the current state of the art in AI agent development.

Read on to see our take on the future of agentic applications.

APIs and LLMs Compared

APIs	LLMs
Highly structured (HTTP interface)	Unstructured (natural language interface)
No tolerance for divergence	High tolerance for divergence
Fast	Slow
Established means of integrating systems	Novel means of integration systems
Does not adapt	Highly adaptable (prone to hallucinate)

The relative assets and liabilities of APIs vs. LLMs

Above is a quick breakdown of the differences between APIs and LLMs. In 2024 business systems consist of sprawling APIs designed to be accessed over HTTP. By themselves and without modification, they are incompatible with natural language interfaces like an LLM.

If we want such systems to be agentic, we have to retrofit a vector database onto them and do the hard work of vectorizing our data.

This is all in service to making that data available in an implementation of RAG so our LLM can incorporate vast proprietary information into its responses. We may have to build a custom chatbot and we may still have to train our chatbot on our data.

What an enormous investment to adapt our backend systems to what is, in essence, a new UI!

We however, take a different approach.

Use Cases

It started with use cases. LLMs are exceptional at consuming textual input and sussing out an appropriate output. We figured if an LLM could do something as simple as deducing the relevant use case associated with a natural language request, we're most of the way toward a solution for a powerful AI assistant.

A use case is anything we'd like the AI to accomplish on our behalf. We prototyped requesting a car with a ride-hailing app. We wanted to be able to speak into our phone's mic and have the assistant fetch us details on a ride somewhere, returning details like the cost and ride duration.

Just What Are Your Intentions?

We chose to use basic prompt engineering to train an LLM to create what we call an intent specification from a request. For example, with a request like, "Get me an Uber to Madison Square Garden," the LLM will return something that looks like this:

{
    context: {
      location: {
        current: {
          address: null,
          bookmarked: false,
          lat: '37.7752315',
          lng: '-122.418075',
          name: null,
        },
        destination: {
          address: null,
          bookmarked: false,
          lat: '<LAT>',
          lng: '<LNG>',
          name: 'AMC Empire 25',
        },
      },
      user: {
        displayName: 'Augustus',
        id: 'dd495dbb-5a2f-46ed-8009-ad2bf0b85fcc',
      },
    },
    domain: 'app.intents.mobility.get_ride',
    reply_id: '903496b2-b3f2-4f70-8574-16ae38403550',
  }

From unstructured to structured. The LLM derives an intent from a natural language request. Our mobile app fashions the above intent spec before sending it to our backend.

This is a template; as it is it is incomplete. However, with few-shot prompting, an LLM becomes very good at producing such templates. Now that we have an intent that has been specified along with the destination name, our mobile app gathers the other contextual data required to query our backend, such as:

display name
userId
current location data

Once our template is populated, we send the request to our backend. On our server, we use the Google Maps Geocoding API to retrieve the coordinates for the destination parameter in our intent specification.

Driving It Home...

Armed with these coordinates, we can make a call to the Uber API to see what Uber services are available.

When Uber replies, we construct an intent reply message and pass it back to our application. Our LLM knows how to convert an intent reply to natural language and from there it's a trivial matter to use a text-to-speech solution to give our app a voice.

Et voila! We now have a voice-activated valet that we can instruct to fetch us a car using natural language.

This design allows us to use structured data, which our existing APIs require, while also allowing us to explore the possibilities of natural language interfaces to provide our end users a richer experience interacting with our systems.

Thinkin' About Tomorrow

As the AI landscape continues to evolve, we will need as many strategies as possible for making the data locked away inside our systems accessible. We will want to do this with as little toil as possible. We may not have the resources or expertise in the form of AI or machine learning engineers to build bespoke agentic systems for us.

The ability to leverage our existing engineering competencies toward emerging AI applications will be critical as this highly specialized knowledge will not exist in adequate supply.

Finding ways to combine traditional APIs and advanced LLMs means we can not only get the best of both worlds when it comes to creating innovative solutions, it also ensures that, no matter what the future of LLMs holds, we will be ready to move at the speed of change.