Nadeesha Cabral for Inferable

Posted on Nov 27 • Originally published at inferable.ai on Nov 24

Adding structured outputs as a feature to any LLM

#main

November 24, 2024

It's nice when the LLM can return structured data. It makes it easier to work with the output of the LLM. For example, OpenAI already does this. But getting structured output from an LLM that doesn't support it can be tricky. While function calling APIs exist in models like GPT-4, they're not available in all models. And even when they are, the output isn't always guaranteed to be valid JSON.

Let's build a simple but reliable JSON parser that:

Uses Zod to validate the structure
Recursively retries with validation errors as feedback
Works with local models via Ollama

Setting up

First, let's install our dependencies:

npm install zod async-retry

We'll also need Ollama running locally with llama3.2. If you haven't already:

ollama pull llama3.2

The Parser Implementation

Here's our implementation of a recursive JSON parser that keeps trying until it gets valid JSON that matches our schema:

// src/parser.ts
import { z } from "zod";
import retry from "async-retry";

interface ParserOptions {
  maxRetries?: number;
  schema: z.ZodSchema;
  prompt: string;
}

async function callOllama(prompt: string) {
  const response = await fetch("http://localhost:11434/api/generate", {
    method: "POST",
    headers: {
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      model: "llama3.2",
      prompt: prompt,
      stream: false,
    }),
  });
  const data = await response.json();
  return data.response;
}

export async function parseWithRetry({ maxRetries = 3, schema, prompt }: ParserOptions) {
  return retry(
    async (bail, attempt) => {
      try {
        // If this isn't the first attempt, add the previous error to the prompt
        const fullPrompt = attempt === 1 
          ? prompt 
          : `${prompt}\n\nPrevious attempt failed with error: ${attempt}. Please fix and try again.`;

        const response = await callOllama(fullPrompt);

        // Extract JSON from the response
        const jsonMatch = response.match(/\{[\s\S]*\}/);
        if (!jsonMatch) {
          throw new Error("No JSON found in response");
        }

        const parsed = JSON.parse(jsonMatch[0]);
        return schema.parse(parsed);
      } catch (error) {
        if (attempt === maxRetries) {
          bail(error as Error);
          return;
        }
        throw error;
      }
    },
    {
      retries: maxRetries,
      factor: 1,
      minTimeout: 100,
      maxTimeout: 1000,
    }
  );
}

Using the Parser

Let's try it out with a simple movie recommendation schema:

// src/example.ts
import { z } from "zod";
import { parseWithRetry } from "./parser";

const MovieSchema = z.object({
  title: z.string(),
  year: z.number(),
  rating: z.number().min(0).max(10),
  genres: z.array(z.string()),
});

async function main() {
  const prompt = `
    Give me a movie recommendation in JSON format with the following structure:
    - title (string)
    - year (number)
    - rating (number between 0-10)
    - genres (array of strings)

    Return only the JSON, no other text.

    Here's an example of valid JSON:
    ${JSON.stringify({
      title: "The Dark Knight",
      year: 2008,
      genres: ["Action", "Crime", "Drama"],
    })}
  `;

  try {
    const result = await parseWithRetry({
      schema: MovieSchema,
      prompt,
      maxRetries: 3,
    });
    console.log("Parsed result:", result);
  } catch (error) {
    console.error("Failed after all retries:", error);
  }
}

main();

When you run this, you might see something like:

{
  "title": "Inception",
  "year": 2010,
  "genres": ["Science Fiction", "Action", "Thriller"]
}

How it Works

The parser takes a Zod schema, a prompt, and optional retry settings
It sends the prompt to Ollama's llama3.2 model
Extracts JSON from the response using regex
Validates the JSON against the Zod schema
If validation fails, it retries with the error message appended to the prompt
This continues until we get valid JSON or hit the retry limit

Improving the Parser

This approach works best with simpler schemas. Complex nested structures might need more retries. In our experience, we've found that breaking down the schema into smaller chunks and handling each chunk separately has worked well.

In other cases, providing few-shot examples of valid and invalid JSON can help the model understand the schema better.

If you're using Claude, you can also prefill Claude's response format to force it to output in the correct format:

[{
  "role": "user",
  "content": "What is your favorite color? Output only the JSON, no other text."
}, {
  "role": "assistant",
  "content": "{ \"color\":"
}]

To run this example in your own machine, clone the repo.

DEV Community

Adding structured outputs as a feature to any LLM

Setting up

The Parser Implementation

Using the Parser

How it Works

Improving the Parser

Top comments (0)

Read next

The Ultimate Guide to Docker, React, Express, and Java 🌟

LogInsight

Simplified Guide to Browser Events

Building a Real-Time Financial Market Analyst API with LLMs