Google shipped 3.5 Flash at I/O 2026. The “budget” Flash model now beats 3.1 Pro on coding and tool-calling benchmarks.
Key numbers (from Google):
- MCP Atlas (tool calling): 83.6% vs 3.1 Pro’s 78.2%
- Terminal-Bench (coding): 76.2% vs 70.3%
- Finance Agent v2: 57.9% vs 43.0%
- 4x faster, ~40% cheaper than Pro
- $1.50/M input, $9/M output, $0.15/M cached
Where it does NOT win:
- Computer Use: not supported (GPT-5.5 only)
- SWE-Bench Pro: Opus 4.7 still leads
- Abstract reasoning: 3.1 Pro still edges it
My quick take on model routing:
- Multi-tool agent loops → Flash
- Heavy code refactoring → Opus 4.7
- GUI automation → GPT-5.5
Anyone tested it on real agent workflows yet? Curious how the 4x speed claim holds up in practice.