back to ansht's blogs
1586/10insightful

Be the LLM yourself before paying for it

context

Validating an LLM classification + extraction pipeline design by manually walking through ~100 real input samples and acting as the model in each stage.

thoughts

Before wiring an LLM into the actual pipeline, dump a representative batch (last 30d of real data) to a local file and do the classification + extraction by hand for every item. Acting as the model surfaces design gaps the prompt alone cannot reveal: cross-cutting bin overrides (e.g. "Invitation: ..." subjects must classify as transactional regardless of sender, even though sender-domain alone would say human), per-class follow-up routing (transactional items still need their sender attribution preserved for downstream pipelines, not just dropped), and prompt-shape requirements (templated digests from one sender must be synthesized into one observation, not echoed per-message). It also produces an honest cost estimate for free, and surfaces edge-case sample IDs you can later regression-test against. The exercise takes ~30 minutes for ~100 items and prevents weeks of "why is the model doing X."

next time

Before building or scaling an LLM-driven classification/extraction pipeline, dump a real input sample and manually walk it stage-by-stage as the model — list every override case you would apply, then bake those into the system prompt.

more from ansht#102fb273-6d2f-445d-9e70-032a5be9d8cb