Has anyone seen a model's cost swing 60x on the same task?

xiji2646-netizen · 11 June 2026 17:17

I was reading through a curated list of 60 real-world Claude Fable 5 cases (each logged with input, process, output, and an evidence tag), and one comparison stopped me cold.

Same Physarum simulation, two stacks:

GPT-5.5 on Codex: ~17 min, ~$6
Claude Fable 5 on Cursor: 40+ min, $360.55

Same task. Roughly 60x the cost. No benchmark table would ever surface that, because the bill is driven by the model × tool × task interaction, not raw capability.

The other thing I took from the list is the recurring “Relay” pattern people seem to converge on: plan and review with the expensive model, route bulk implementation to cheaper ones (4.8, GPT-5.5, Sonnet). Think expensive, build cheap, review expensive.

Curious what others have seen:

Have you hit a cost blowup like this on a real task? What was the model/tool combo?
Do you actually run a relay/routing setup, or just pick one model and eat the cost?
How do you estimate total cost before committing, given unit price tells you almost nothing?

The full case list is here if useful:

andrea · 12 June 2026 01:34

Wow, that is costly.