Back to log2026-03-15

Replaced reasoning with rules - automotive triage 11 seconds to 280 milliseconds.

Advisory / Insurance sectorai-archlatencyinsurance

Reviewed an automotive incident triage prompt that was costing eleven seconds per decision on average. The prompt was asking the model to both decide and explain - and the deciding was where the cost lived.

Pulled the reasoning out of the model and into a twelve-rule classifier in front of it. The classifier decides; the model still writes the explanation. Latency landed at 280ms average, p99 at 410ms. Accuracy held within the original confidence interval across the four thousand-case regression set.

The lesson isn't anti-model. It's that "model decides" and "model explains" are two different jobs, and conflating them was costing thirty-five times the latency. The model is excellent at the second job. It was being asked to do the first one as a side effect.

Not every problem wants a model. Some want a rule. The hard part is knowing which is which on first read.