DEV Community

Nadeesha Cabral for Inferable

Posted on • Originally published at inferable.ai on

Adding structured outputs as a feature to any LLM

By Nadeesha Cabral

November 24, 2024

It's nice when the LLM can return structured data. It makes it easier to work with the output of the LLM. For example, OpenAI already does this. But getting structured output from an LLM that doesn't support it can be tricky. While function calling APIs exist in models like GPT-4, they're not available in all models. And even when they are, the output isn't always guaranteed to be valid JSON.

Let's build a simple but reliable JSON parser that:

  1. Uses Zod to validate the structure
  2. Recursively retries with validation errors as feedback
  3. Works with local models via Ollama

Setting up

First, let's install our dependencies:

npm install zod async-retry
Enter fullscreen mode Exit fullscreen mode

We'll also need Ollama running locally with llama3.2. If you haven't already:

ollama pull llama3.2
Enter fullscreen mode Exit fullscreen mode

The Parser Implementation

Here's our implementation of a recursive JSON parser that keeps trying until it gets valid JSON that matches our schema:

// src/parser.ts
import { z } from "zod";
import retry from "async-retry";

interface ParserOptions {
  maxRetries?: number;
  schema: z.ZodSchema;
  prompt: string;
}

async function callOllama(prompt: string) {
  const response = await fetch("http://localhost:11434/api/generate", {
    method: "POST",
    headers: {
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      model: "llama3.2",
      prompt: prompt,
      stream: false,
    }),
  });
  const data = await response.json();
  return data.response;
}

export async function parseWithRetry({ maxRetries = 3, schema, prompt }: ParserOptions) {
  return retry(
    async (bail, attempt) => {
      try {
        // If this isn't the first attempt, add the previous error to the prompt
        const fullPrompt = attempt === 1 
          ? prompt 
          : `${prompt}\n\nPrevious attempt failed with error: ${attempt}. Please fix and try again.`;

        const response = await callOllama(fullPrompt);

        // Extract JSON from the response
        const jsonMatch = response.match(/\{[\s\S]*\}/);
        if (!jsonMatch) {
          throw new Error("No JSON found in response");
        }

        const parsed = JSON.parse(jsonMatch[0]);
        return schema.parse(parsed);
      } catch (error) {
        if (attempt === maxRetries) {
          bail(error as Error);
          return;
        }
        throw error;
      }
    },
    {
      retries: maxRetries,
      factor: 1,
      minTimeout: 100,
      maxTimeout: 1000,
    }
  );
}
Enter fullscreen mode Exit fullscreen mode

Using the Parser

Let's try it out with a simple movie recommendation schema:

// src/example.ts
import { z } from "zod";
import { parseWithRetry } from "./parser";

const MovieSchema = z.object({
  title: z.string(),
  year: z.number(),
  rating: z.number().min(0).max(10),
  genres: z.array(z.string()),
});

async function main() {
  const prompt = `
    Give me a movie recommendation in JSON format with the following structure:
    - title (string)
    - year (number)
    - rating (number between 0-10)
    - genres (array of strings)

    Return only the JSON, no other text.

    Here's an example of valid JSON:
    ${JSON.stringify({
      title: "The Dark Knight",
      year: 2008,
      genres: ["Action", "Crime", "Drama"],
    })}
  `;

  try {
    const result = await parseWithRetry({
      schema: MovieSchema,
      prompt,
      maxRetries: 3,
    });
    console.log("Parsed result:", result);
  } catch (error) {
    console.error("Failed after all retries:", error);
  }
}

main();
Enter fullscreen mode Exit fullscreen mode

When you run this, you might see something like:

{
  "title": "Inception",
  "year": 2010,
  "genres": ["Science Fiction", "Action", "Thriller"]
}
Enter fullscreen mode Exit fullscreen mode

How it Works

  1. The parser takes a Zod schema, a prompt, and optional retry settings
  2. It sends the prompt to Ollama's llama3.2 model
  3. Extracts JSON from the response using regex
  4. Validates the JSON against the Zod schema
  5. If validation fails, it retries with the error message appended to the prompt
  6. This continues until we get valid JSON or hit the retry limit

Improving the Parser

This approach works best with simpler schemas. Complex nested structures might need more retries. In our experience, we've found that breaking down the schema into smaller chunks and handling each chunk separately has worked well.

In other cases, providing few-shot examples of valid and invalid JSON can help the model understand the schema better.

If you're using Claude, you can also prefill Claude's response format to force it to output in the correct format:

[{
  "role": "user",
  "content": "What is your favorite color? Output only the JSON, no other text."
}, {
  "role": "assistant",
  "content": "{ \"color\":"
}]
Enter fullscreen mode Exit fullscreen mode

To run this example in your own machine, clone the repo.

Top comments (0)