treeru.com
AI

Structuring Unstructured Documents with an LLM — Measured Accuracy on a 19-Item Menu

2026-07-02
Treeru
AI · July 2, 2026

Can an LLM turn a menu document into database records you can trust? We fed unstructured markdown documents (a 19-item menu and a store info sheet) to a locally-served Qwen3-14B-AWQ, extracted structured JSON, and compared it field-by-field against a manually entered ground truth database. The results: 100% accuracy on prices and categories, 94.7% recall on item names, 5.92 seconds for the full extraction — and one fascinating failure mode that schema validation cannot catch.

100%

Price & category accuracy

94.7%

Menu name recall

5.92s

Extraction (19 items)

71.4%

Business info match

The Data Entry Bottleneck

Every service that handles store or business information hits the same bottleneck: the real information lives in unstructured documents (menus, info sheets, Word files), while the system needs schema-conforming structured data — names, prices, categories, opening hours. Bridging that gap manually takes tens of minutes per store and introduces typos and omissions. LLMs are theoretically perfect for this conversion, but demos never tell you the number that matters for a production decision: field-level accuracy. So we measured it.

Test Design: Build the Answer Key First

Measuring accuracy requires ground truth. We used a fictional test cafe dataset: a 19-item menu (menu.md) and a store info sheet (info.md) in markdown, with the same content manually entered into PostgreSQL as the reference. The pipeline: unstructured document → LLM extraction with a structuring prompt and JSON schema → field-by-field comparison against the manual DB.

Three setup decisions mattered. The model was Qwen3-14B-AWQ served by SGLang — deliberately a mid-size local model, because the question was what a realistically operable on-premise model can do, not a cloud flagship. Temperature was 0.1, since extraction needs reproducibility, not creativity. And Qwen3’s thinking mode was disabled (enable_thinking: false), which stabilized JSON output by keeping reasoning traces from leaking into the response and breaking the parser.

MetricResultGrade
Menu items extracted19 (matches DB exactly)Perfect
Name recall94.7% (18/19)Excellent
Price accuracy100.0%Perfect
Category accuracy100.0%Perfect
Extraction time5.92 seconds-

The key finding: the closer data is to tabular/structured form, the stronger LLM extraction gets. Prices and categories came out perfect across all 19 items — no transposed digits, no dropped zeros. Considering human typo rates when transcribing 19 menu prices, this is already beyond human accuracy. And the whole extraction took 5.92 seconds versus 10+ minutes by hand — speed is not even a comparison.

The Failure Mode: A Plausibly Wrong Name

The single name mismatch is the most interesting result. The original item “벚꽃라떼” (cherry blossom latte) was extracted as “베트꽃라떼” — not a hallucinated item, but an OCR-like transcription slip in Korean syllables. Its price and category were still correct: the model recognized the item but misspelled it.

This error class is dangerous precisely because it is plausibly wrong. A completely wrong value fails validation; a misspelled-but-well-formed string passes JSON schema checks — it is a non-empty string of the right shape. Catching it requires source-document cross-checking in post-processing, e.g. verifying that every extracted string actually appears as a substring in the original document.

Two more observations: allergy info explicitly stated in the source (desserts) was extracted correctly, while unstated fields (drinks) were returned as null — the model did not fabricate missing information, which is a positive. Description fields came back empty because the source table had none: extraction quality is ultimately capped by source quality.

Business Info: The Normalization Wall

FieldMatchNote
AddressOMinor notation difference ("Seoul" vs full official name) passed
Phone numberOExact match
Wi-Fi SSIDOExact match
Wi-Fi passwordOExact match
Weekday opening timeO09:00, exact
Weekday closing timeO22:00, exact
Indoor seat countXValue is 30 in both — string vs integer type mismatch

Seven fields, five matches (71.4%), 2.38 seconds. The crucial nuance: the mismatches were not comprehension failures. Every fact — phone, Wi-Fi, hours — was extracted correctly. The failures were normalization issues: two valid spellings of the same city name, and a seat count returned as the string “30” instead of the integer 30. These are problems for deterministic post-processing code, not for another LLM call. Lumping them together as “the LLM was wrong” hides the actual fix.

Production Playbook

  1. Treat LLM extraction as a draft generator, not a final data source. Even with some fields at 100%, the pipeline’s trust level must be set by its weakest field.
  2. Layer schema validation with source cross-checking. JSON schema catches structural errors; substring verification against the original document catches plausible typos; type casting and notation normalization handle the rest — all deterministic code.
  3. Move humans from data entry to review. The job changes from “type in 19 items” to “check the one or two flagged items.” Time drops to a tenth; accuracy goes up.

One more takeaway on model size: these numbers came from a quantized 14B model running locally, not a cloud flagship. Document-to-structured-data work is already within reach of mid-size local models — deployable even where data cannot leave the premises.

Conclusion

LLM auto-structuring is already practical as an assisted data-entry tool. Tabular data extracts at or near 100%; the dangerous errors are plausibly wrong values that pass schema validation, so source cross-checking is mandatory; and most mismatches are normalization and typing issues solvable in deterministic post-processing. Keep a human in the loop as reviewer — that role shift, from typist to reviewer, is where the real value of this technique lives.

T

Treeru

Sharing practical insights on web development, IT infrastructure, and AI solutions. Treeru — your partner in digital transformation.

Share

Related Posts

© 2026 TreeRU. All rights reserved.

All content is copyrighted by TreeRU. Unauthorized reproduction without attribution is prohibited.