Reviewed an automotive incident triage prompt that was costing eleven seconds per decision on average. The prompt was asking the model to both decide and explain - and the deciding was where the cost lived.
Pulled the reasoning out of the model and into a twelve-rule classifier in front of it. The classifier decides; the model still writes the explanation. Latency landed at 280ms average, p99 at 410ms. Accuracy held within the original confidence interval across the four thousand-case regression set.
The lesson isn't anti-model. It's that "model decides" and "model explains" are two different jobs, and conflating them was costing thirty-five times the latency. The model is excellent at the second job. It was being asked to do the first one as a side effect.
Not every problem wants a model. Some want a rule. The hard part is knowing which is which on first read.