Introduction
When working with Large Language Model (LLM) APIs, you often receive responses that contain extraneous text, making it challenging to extract clean JSON data. Even when you strictly define the expected output format, LLMs do not always guarantee a perfectly formatted JSON response. The LLM-JSON-Cleaner library helps for extracting and validating JSON responses.
This post explores usage of the LLM-JSON-Cleaner composer library.
Features
- JSON Extraction: Extracts clean JSON data from LLM responses.
- Schema Validation: Validates JSON data against defined schemas to ensure correctness.
Installation
You can install the package using Composer:
composer require edgaras/llm-json-cleaner
Extracting JSON from LLM Responses
A common issue when dealing with LLM APIs is that JSON responses are embedded in additional text. The JsonCleaner
class helps extract JSON data from such responses:
require_once 'vendor/autoload.php';
use Edgaras\LLMJsonCleaner\JsonCleaner;
$llmResponse = "Hi there! Please find the details below:\n\n{
\"task\": \"generate_report\",
\"parameters\": {
\"date\": \"2025-02-17\",
\"format\": \"pdf\"
}
}\n\nLet me know if you need further assistance.";
// Extract JSON as a string
$cleanJson = JsonCleaner::extract($llmResponse, false);
echo $cleanJson;
// Output: {"task":"generate_report","parameters":{"date":"2025-02-17","format":"pdf"}}
// Extract JSON as an associative array
$cleanJsonArray = JsonCleaner::extract($llmResponse, true);
print_r($cleanJsonArray);
Validating JSON Against a Schema
The JsonValidator
class ensures that extracted JSON adheres to a predefined schema, preventing malformed or unexpected input.
require_once 'vendor/autoload.php';
use Edgaras\LLMJsonCleaner\JsonValidator;
$json = '{
"order_id": 401,
"customer": "Alice",
"payment_methods": [
{
"method_id": "p1",
"type": "Credit Card"
},
{
"method_id": "p2",
"type": "PayPal"
}
]
}';
$schema = [
'order_id' => ['required', 'integer', 'min:1'],
'customer' => ['required', 'string'],
'payment_methods' => ['required', 'array', 'min:1'],
'payment_methods.*.method_id' => ['required', 'string'],
'payment_methods.*.type' => ['required', 'string'],
];
$validationResult = JsonValidator::validateSchema(json_decode($json, true), $schema);
var_dump($validationResult);
// bool(true)
If the JSON does not meet the schema requirements, an error report is returned:
$schemaPartial = [
'order_id' => ['required', 'integer', 'min:1'],
'customer' => ['required', 'string'],
];
$validationResult = JsonValidator::validateSchema(json_decode($json, true), $schemaPartial);
print_r($validationResult);
// Output:
// Array (
// [payment_methods] => Array (
// [0] => Unexpected field: payment_methods
// )
// )
Conclusion
LLM-JSON-Cleaner is a valuable tool for working with LLM APIs, ensuring clean JSON extraction and validation. By filtering out unnecessary text and enforcing structured formatting, it helps developers reliably parse LLM-generated responses while reducing the risk of malformed or incomplete data.
Top comments (0)