DEV Community

Edgaras
Edgaras

Posted on

Ensuring Reliable JSON from LLM Responses in PHP

Introduction

When working with Large Language Model (LLM) APIs, you often receive responses that contain extraneous text, making it challenging to extract clean JSON data. Even when you strictly define the expected output format, LLMs do not always guarantee a perfectly formatted JSON response. The LLM-JSON-Cleaner library helps for extracting and validating JSON responses.

This post explores usage of the LLM-JSON-Cleaner composer library.

Features

  • JSON Extraction: Extracts clean JSON data from LLM responses.
  • Schema Validation: Validates JSON data against defined schemas to ensure correctness.

Installation

You can install the package using Composer:

composer require edgaras/llm-json-cleaner
Enter fullscreen mode Exit fullscreen mode

Extracting JSON from LLM Responses

A common issue when dealing with LLM APIs is that JSON responses are embedded in additional text. The JsonCleaner class helps extract JSON data from such responses:

require_once 'vendor/autoload.php';

use Edgaras\LLMJsonCleaner\JsonCleaner;

$llmResponse = "Hi there! Please find the details below:\n\n{
    \"task\": \"generate_report\",
    \"parameters\": {
        \"date\": \"2025-02-17\",
        \"format\": \"pdf\"
    }
}\n\nLet me know if you need further assistance.";

// Extract JSON as a string
$cleanJson = JsonCleaner::extract($llmResponse, false);
echo $cleanJson;
// Output: {"task":"generate_report","parameters":{"date":"2025-02-17","format":"pdf"}}

// Extract JSON as an associative array
$cleanJsonArray = JsonCleaner::extract($llmResponse, true);
print_r($cleanJsonArray);
Enter fullscreen mode Exit fullscreen mode

Validating JSON Against a Schema

The JsonValidator class ensures that extracted JSON adheres to a predefined schema, preventing malformed or unexpected input.

require_once 'vendor/autoload.php';

use Edgaras\LLMJsonCleaner\JsonValidator;

$json = '{
  "order_id": 401,
  "customer": "Alice",
  "payment_methods": [
    {
      "method_id": "p1",
      "type": "Credit Card"
    },
    {
      "method_id": "p2",
      "type": "PayPal"
    }
  ]
}';

$schema = [
  'order_id' => ['required', 'integer', 'min:1'],
  'customer' => ['required', 'string'],
  'payment_methods' => ['required', 'array', 'min:1'],
  'payment_methods.*.method_id' => ['required', 'string'],
  'payment_methods.*.type' => ['required', 'string'],
];

$validationResult = JsonValidator::validateSchema(json_decode($json, true), $schema);
var_dump($validationResult); 
// bool(true)
Enter fullscreen mode Exit fullscreen mode

If the JSON does not meet the schema requirements, an error report is returned:

$schemaPartial = [
  'order_id' => ['required', 'integer', 'min:1'],
  'customer' => ['required', 'string'],
];

$validationResult = JsonValidator::validateSchema(json_decode($json, true), $schemaPartial);
print_r($validationResult);
// Output:
// Array (
//   [payment_methods] => Array (
//     [0] => Unexpected field: payment_methods
//   )
// )
Enter fullscreen mode Exit fullscreen mode

Conclusion

LLM-JSON-Cleaner is a valuable tool for working with LLM APIs, ensuring clean JSON extraction and validation. By filtering out unnecessary text and enforcing structured formatting, it helps developers reliably parse LLM-generated responses while reducing the risk of malformed or incomplete data.

Top comments (0)