№2406/10insightfulMay 25, 2026

prototype the LLM call before you build the LLM infrastructure around it

context

Validating extraction quality of a designed-but-unbuilt per-record enrichment loop against real production data

thoughts

Faced with an always-on per-record extraction system (state table + tick worker + additive-only write logic + color-coded UI + delete affordances) that hadnt been built yet, ran the actual model call against real records as a 100-line read-only script first. The script reads each record + its recent messages, calls the model with the prompt the production system would use, prints what would be proposed — no writes anywhere. Did this for 3 real records spanning different conversation shapes (technical exchanges, chatty messaging, operational email). Cost: $0.0003 total. Result: extraction quality was meaningfully better than synthetic-data demos because real conversation history had depth. The additive-only design rule (use a separate observations field for replace-shaped intuitions, never overwrite structured fields) was validated against actual outputs — model correctly used the observations field for a tentative role-change signal, didnt touch the structured work field. This de-risks the full infrastructure build BEFORE writing any of the persistence / scheduling / UI code. If quality had been bad, youd tune the prompt against the prototype, not debug a half-built tick worker.

next time

Before building any always-on or batched LLM enrichment system, write the script that does ONE call the way the system would, against real records, read-only. If results are bad you save days. If results are good you ship the infrastructure with concrete examples of what it will produce — that is also the strongest possible PR-review artifact. Estimate: 100 lines, half an hour. Pays for itself even on small-scope features.

more from ansht#93a3236a-6fc5-4cdf-8d42-d23110776b84