Real-Life Key Information Extraction (Part 2)
Welcome back! It's me, Mrzaizai2k, once again.
In Part 1, we explored various techniques for extracting information from invoices. We discussed enhancing results by refining prompts and utilizing OCR models such as PaddleOCR and metaclip-b32-400m for improved accuracy across multiple invoice formats and languages.
If you haven't read Part 1 yet, I recommend checking it out to better understand the context and challenges.
Why Open-Source LLMs?
ChatGPT is undoubtedly a powerful tool, but it has limitations, particularly when it comes to data privacy. There are situations where you may not want to expose sensitive data to third-party services. While solutions like data masking with Input Guardrails exist, running local models can provide full control over your data.
In this guide, I'll demonstrate how to run open-source LLMs (Qwen2 2B and LLaMA 3.1) to achieve results comparable to or even better than ChatGPT on an RTX 3060 GPU with 12 GB of VRAM.
Why Qwen2 2B and LLaMA 3.1?
Hardware Limitations
I have limited hardware resources: an RTX 3060 with 12 GB of VRAM. Therefore, I opted for models that could run efficiently on my setup:
- Qwen2-VL-2B-Instruct (Hugging Face link) instead of larger 7B models.
- LLaMA 3.1 (8B) from Ollama, which is optimized for long-context understanding.
Why Not Other Models?
- Qwen2 in Ollama: It doesn't support image inputs, which is crucial for this task.
- LLaVA: Its vision capabilities were insufficient and struggled with multilingual data extraction.
Why This Combination?
-
Qwen2: Excellent for general key information extraction but struggles with:
- Handling long-context data (leading to missing information).
- Consistently producing valid JSON outputs.
-
LLaMA 3.1: Though smaller, it excels in:
- Handling long-context documents.
- Producing accurate JSON outputs.
- Supporting multiple languages effectively.
The Strategy
- Qwen2 2B: Extracts raw information from the invoice.
- LLaMA 3.1: Acts as a post-processor to validate and map the extracted values into a structured JSON template.
Enhancing Vision Capabilities
To improve Qwen2's vision capabilities, we'll use PaddleOCR, as demonstrated in Part 1.
In the next steps, I'll walk you through setting up these models and combining them for optimal key information extraction results. Stay tuned!
Let's Begin
Baseline GPT
I'll use the latest Japanese invoice for extraction.
The result shown below is from the OCR system, which includes a language detector and PaddleOCR.
Recognized Text:
{'ori_text': '根岸 東急ストア TEL 045-752-6131 領収証 [TOP2C!UbO J3カード」 クレヅッ 卜でのお支払なら 200円で3ボイン卜 お得なカード! 是非こ入会下さい。 2013年09月02日(月) レジNO. 0102 NOO07さ と う 001131 スダフエウ卜チーネ 23 単198 1396 003271 オインイ年 ユウ10 4238 000805 ソマ一ク スモー一クサーモン 1298 003276 タカナン ナマクリーム35 1298 001093 ヌテラ スフレクト 1398 000335 バナサ 138 000112 アボト 2つ 単158 1316 A000191 タマネキ 429 合計 2,111 (内消費税等 100 現金 10001 お預り合計 110 001 お釣り 7 890',
'ori_language': 'ja',
'text': 'Negishi Tokyu Store TEL 045-752-6131 Receipt [TOP2C!UbO J3 Card] If you pay with a credit card, you can get 3 points for 200 yen.A great value card!Please join us. Monday, September 2, 2013 Cashier No. 0102 NOO07 Satou 001131 Sudafue Bucine 23 Single 198 1396 003271 Oinyen Yu 10 4238 000805 Soma Iku Smo Iku Salmon 1298 003276 Takanan Nama Cream 35 1 298 001093 Nutella Sprect 1398 000335 Banasa 138 000112 Aboto 2 AA 158 1316 A000191 Eggplant 429 Total 2,111 (including consumption tax, etc. 100 Cash 10001 Total deposited 110 001 Change 7 890',
'language': 'en',}
We will take the result from chatgpt as reference.
invoice_info {
"invoice_info": {
"amount": 2111,
"amount_change": 7890,
"currency": "JPY",
"purchasedate": "02/09/2013",
"purchasetime": "00:00:00",
"vatitems": [
{
"amount": 2111,
"amount_excl_vat": 2011,
"amount_incl_vat": 2111,
"amount_incl_excl_vat_estimated": false,
"percentage": 5,
"code": ""
}
],
"lines": [
{
"description": "",
"lineitems": [
{
"title": "Oinyen Yu",
"description": "",
"amount": 396,
"amount_each": 198,
"amount_ex_vat": 376,
"vat_amount": 20,
"vat_percentage": 5,
"quantity": 2,
"unit_of_measurement": "\u500b",
"sku": "003271",
"vat_code": ""
},
{
"title": "Soma Iku",
"description": "",
"amount": 298,
"amount_each": 298,
"amount_ex_vat": 284,
"vat_amount": 14,
"vat_percentage": 5,
"quantity": 1,
"unit_of_measurement": "\u500b",
"sku": "003276",
"vat_code": ""
},
{
"title": "Nutella Sprect",
"description": "",
"amount": 398,
"amount_each": 398,
"amount_ex_vat": 378,
"vat_amount": 20,
"vat_percentage": 5,
"quantity": 1,
"unit_of_measurement": "\u500b",
"sku": "001093",
"vat_code": ""
},
{
"title": "Banana",
"description": "",
"amount": 138,
"amount_each": 138,
"amount_ex_vat": 131,
"vat_amount": 7,
"vat_percentage": 5,
"quantity": 1,
"unit_of_measurement": "\u500b",
"sku": "000335",
"vat_code": ""
},
{
"title": "Eggplant",
"description": "",
"amount": 316,
"amount_each": 158,
"amount_ex_vat": 300,
"vat_amount": 16,
"vat_percentage": 5,
"quantity": 2,
"unit_of_measurement": "\u500b",
"sku": "A000191",
"vat_code": ""
}
]
}
],
"paymentmethod": "Cash",
"merchant_name": "Negishi Tokyu Store",
"receipt_number": "0102",
"shop_number": "",
"transaction_number": "",
"order_number": "",
"document_language": "ja"
}
}
Test_Openai_Invoice Took 0:00:16.45
Wow, the result is astounding! It managed to extract all the information here. I'm a bit worried—I don't know if I can outperform it.
Qwen2 VL 2B Instruct
In this section, I'll demonstrate using Qwen2 alone so you can see its limitations without post-processing.
@timeit
def _extract_invoice_llm(self, text, image: Union[str, np.ndarray], invoice_template:str):
# Prepare the messages for Qwen2
messages = [
{"role": "system", "content": """You are a helpful assistant that responds in JSON format with the invoice information in English.
Don't add any annotations there. Remember to close any bracket. And just output the field that has value,
don't return field that are empty. number, price and amount should be number, date should be convert to dd/mm/yyyy,
time should be convert to HH:mm:ss, currency should be 3 chracters like VND, USD, EUR"""},
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": f"""From the image of the bill and the text from OCR, extract the information. The ocr text is: {text} \n. Return the key names as in the template is a MUST. The invoice template: \n {invoice_template}"""}
]}
]
# Preparation for inference
text_inputs = self.processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = self.processor(
text=[text_inputs],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt"
)
inputs = inputs.to(self.device)
# Inference: Generation of the output
generated_ids = self.model.generate(**inputs, max_new_tokens=self.config['max_new_tokens'],
temperature=self.config['temperature'], # Add temperature parameter
top_p=self.config['top_p'], # Add top_p parameter
top_k=self.config['top_k'], # Add top_k parameter)
)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = self.processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
return output_text[0]
{
"invoice_info": {
"amount": 2111,
"amount_change": 0,
"amount_shipping": 0,
"vatamount": 0,
"amountexvat": 2111,
"currency": "JPY",
"purchasedate": "2013-09-02",
"purchasetime": "",
"vatitems": [
{
"amount": 396,
"amount_excl_vat": 396,
"amount_incl_vat": 396,
"amount_incl_excl_vat_estimated": false,
"percentage": 0.000000,
"code": ""
}
],
"vat_context": "JMB",
"lines": [
{
"description": "TOP&ClubQ JMBカード",
"lineitems": [
{
"title": "2コ x 単198",
"description": "キノハスタフエットチーネ",
"amount": 396,
"amount_each": 198,
"amount_ex_vat": 396,
"vat_amount": 396,
"vat_percentage": 0.000000,
"quantity": 2,
"unit_of_measurement": "単",
"sku": "001131",
"vat_code": ""
}
]
}
],
"paymentmethod": "credit card",
"payment_auth_code": "",
"payment_card_number
The other time
{
"invoice_info": {
"amount": 2111,
"amount_change": 100,
"amount_shipping": 0,
"vatamount": 0,
"amountexvat": 2111,
"currency": "VND",
"purchasedate": "2013/09/02",
"purchasetime": "",
"vatitems": [
{
"amount": 0,
"amount_excl_vat": 0,
"amount_incl_vat": 0,
"amount_incl_excl_vat_estimated": false,
"percentage": 0,
"code": ""
}
],
"vat_context": "",
"lines": [
{
"description": "TOP&ClubQ JMBカード",
"lineitems": [
{
"title": "2コ x 単198",
"description": "198円",
"amount": 396,
"amount_each": 198,
"amount_ex_vat": 396,
"vat_amount": 0,
"vat_percentage": 0,
"quantity": 1,
"unit_of_measurement": "単198",
"sku": "TOP&ClubQ JMBカード",
"vat_code": ""
}
]
}
],
"paymentmethod": "TOP&ClubQ JMBカード",
"payment_auth_code": "",
"payment_card_number": "",
"payment_card_account_number": "",
"payment_card_bank": "",
"payment_card_issuer": "",
"payment_card_issuer_name": "",
"payment_card_issuer_address": "",
"payment_card_issuer_city": "",
"payment_card_issuer_state": "",
"payment_card_issuer_zip": "",
"payment_card_issuer_country": "",
"payment_card_issuer_currency": "",
"payment_card_issuer_code": "",
"payment_card_issuer_code_name": "",
"payment_card_issuer_code_address": "",
"payment_card_issuer_code_city": "",
"payment_card_issuer_code_state": "",
"payment_card_issuer_code_zip": "",
"payment_card_issuer_code_country": "",
"payment_card_issuer_code_currency": "",
"payment_card_issuer_code_code": "",
"payment_card_issuer_code_code_name": "",
"payment_card_issuer_code_code_address": "",
"payment_card_issuer_code_code_city": "",
"payment_card_issuer_code_code_state": "",
"payment_card_issuer_code_code_zip": "",
"payment_card_issuer_code_code_country": "",
"payment_card_issuer_code_code_currency": "",
"payment_card_issuer_code_code_code": "",
"payment_card_issuer_code_code_code_name": "",
"payment_card_issuer_code_code_code_address": "",
"payment_card_issuer_code_code_code_city": "",
"payment_card_issuer_code_code_code_state": "",
"payment_card_issuer_code_code_code_zip": "",
"payment_card_issuer_code_code_code_country": "",
"payment_card_issuer_code_code_code_code": "",
"payment_card_issuer_code_code_code_code_name": "",
"payment_card_issuer_code_code_code_code_address": "",
"payment_card_issuer_code_code_code_code_city": "",
"payment_card_issuer_code_code_code_code_state": "",
"payment_card_issuer_code_code_code_code_zip": "",
"payment_card_issuer_code_code_code_code_country": "",
"payment_card_issuer_code_code_code_code_code": "",
"payment_card_issuer_code_code_code_code_code_name": "",
"payment_card_issuer_code_code_code_code_code_address": "",
"payment_card_issuer_code_code_code_code_code_city": "",
"payment_card_issuer_code_code_code_code_code_state": "",
"payment_card_issuer_code_code_code_code_code_zip": "",
"payment_card_issuer_code_code_code_code_code_country": "",
"payment_card_issuer_code_code_code_code_code_code": "",
"payment_card_issuer_code_code_code_code_code_code_name": "",
"payment_card_issuer_code_code_code_code_code_code_address": "",
"payment_card_issuer_code_code_code_code_code_code_city": "",
"payment_card_issuer_code_code_code_code_code_code_state": "",
"payment_card_issuer_code_code_code_code_code_code_zip": "",
"payment_card_issuer_code_code_code_code_code_code_country": "",
"payment_card_issuer_code_code_code_code_code_code_code": "",
"payment_card_issuer_code_code_code_code_code_code_code_name": "",
"payment_card_issuer_code_code_code_code_code_code_code_address": "",
"payment_card_issuer_code_code_code_code_code_code
Wrapper Took 0:00:34.91
Oh no, check out the last fields! The values are incorrect throughout. I definitely can't rely on this alone.
Combining Qwen2 and LLaMA 3.1
Now, I'll let Qwen2 generate its output freely, and then use LLaMA 3.1 to map those values to a structured template. Let's see how it turns out.
Here is the result from Qwen2:
_Extract_Invoice_Llm Took 0:00:11.69
model_text The invoice from Negishi Tokyu Store has the following details:
- TEL: 045-752-6131
- Receipt number: 0102
- Cashier: Satou
- Date: September 2, 2013
- Amount: 2,111 yen
- Payment method: Credit card
- Payment amount: 10,001 yen
- Change: 7,890 yen
The items and their prices are listed below:
1. 001131 - Sudafue Bucine: 198 yen
2. 003271 - Oinyen Yu: 4238 yen
3. 000805 - Soma Iku Smo Iku Salmon: 1298 yen
4. 003276 - Takanan Nama Cream: 35 yen
5. 001093 - Nutella Sprect: 1398 yen
6. 000335 - Banasa: 138 yen
7. 000112 - Aboto: 158 yen
Total amount paid: 2,111 yen
Total amount received: 10,001 yen
Change received: 7,890 yen
This is the result after post processing
{
"invoice_info": {
"amount": 2111,
"currency": "JPY",
"purchasedate": "02/09/2013",
"purchasetime": "HH:mm:ss",
"lines": [
{
"description": "",
"lineitems": [
{
"title": "",
"description": "",
"amount": 198,
"amount_each": 198,
"amount_ex_vat": 198,
"vat_amount": 0,
"vat_percentage": 0,
"quantity": 1,
"unit_of_measurement": "",
"sku": "001131",
"vat_code": ""
},
{
"title": "",
"description": "",
"amount": 4238,
"amount_each": 4238,
"amount_ex_vat": 4238,
"vat_amount": 0,
"vat_percentage": 0,
"quantity": 1,
"unit_of_measurement": "",
"sku": "003271",
"vat_code": ""
},
{
"title": "",
"description": "",
"amount": 1298,
"amount_each": 1298,
"amount_ex_vat": 1298,
"vat_amount": 0,
"vat_percentage": 0,
"quantity": 1,
"unit_of_measurement": "",
"sku": "000805",
"vat_code": ""
},
{
"title": "",
"description": "",
"amount": 35,
"amount_each": 35,
"amount_ex_vat": 35,
"vat_amount": 0,
"vat_percentage": 0,
"quantity": 1,
"unit_of_measurement": "",
"sku": "003276",
"vat_code": ""
},
{
"title": "",
"description": "",
"amount": 1398,
"amount_each": 1398,
"amount_ex_vat": 1398,
"vat_amount": 0,
"vat_percentage": 0,
"quantity": 1,
"unit_of_measurement": "",
"sku": "001093",
"vat_code": ""
},
{
"title": "",
"description": "",
"amount": 138,
"amount_each": 138,
"amount_ex_vat": 138,
"vat_amount": 0,
"vat_percentage": 0,
"quantity": 1,
"unit_of_measurement": "",
"sku": "000335",
"vat_code": ""
},
{
"title": "",
"description": "",
"amount": 158,
"amount_each": 158,
"amount_ex_vat": 158,
"vat_amount": 0,
"vat_percentage": 0,
"quantity": 1,
"unit_of_measurement": "",
"sku": "000112",
"vat_code": ""
}
]
}
],
"merchant_name": "Negishi Tokyu Store",
"customer_name": "",
"paymentmethod": "Credit card",
"amount_change": 7890,
"amount_shipping": 100,
"vatamount": 0,
"amountexvat": 2111
}
}
The results have improved but still fall short. While it can now capture fields like amount, amount_change, currency, merchant_phone, and purchase_date, it struggles with item-level details.
This task seems to be quite challenging for the current models. However, there must be a way to optimize their performance further.
In the next part, I will show you how to fine-tune the Qwen2VL model specifically on receipt data to maximize both accuracy and speed for this specialized task.
Conclusion
Key information extraction using Qwen2VL and LLaMA 3.1 has shown mixed results. While basic fields can be extracted with reasonable accuracy, item-level details still pose a challenge. However, by combining models and leveraging post-processing techniques, we can significantly improve the results.
Looking ahead, fine-tuning the Qwen2VL model on specialized datasets like receipt data could be the key to achieving optimal performance. Stay tuned for the next part, where we'll dive into the fine-tuning process to further enhance accuracy and efficiency.
Top comments (5)
Instead of linking to your previous post using in-content links, make it a series, as it is more SEO optimized and user-friendly.
Thanks a lot for your advice, I will
Wow! I can't believe this post has gained so many views and saves! It truly means a lot to me. I'll be sharing more posts in this series soon—stay tuned!
Please check my post on finetuning the Qwen2VL on custom dataset here: dev.to/mrzaizai2k/how-to-finetune-...
Wow i'm waiting to see the new post on finetuning Qwen2VL on receipt dataset 🎉🔥👍