I’ve been tracking this for the past two weeks and wanted to see if others are experiencing the same thing.
BridgeBench (independent hallucination benchmark) now shows Opus 4.6 at #10 with a 33% fabrication rate — down from #2 with 83.3% accuracy just weeks ago. That’s one in three responses containing fabricated information.
The root cause appears to be two default changes:
-
Effort level default dropped from “high” to “medium” (March 3, 2026)
-
Adaptive thinking introduced (Feb 9, 2026) — under medium effort, some turns get zero reasoning tokens
An AMD exec analyzed 6,852 sessions and measured a 67% reasoning depth drop. @om_patel5’s A/B test (same prompt, 4.6 vs 4.5) showed 4.6 failing 5/5 while 4.5 passed 5/5.
What’s working for me:
export CLAUDE_CODE_EFFORT_LEVEL=max
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1
Or just /effort max\ per session.
Some devs are switching back to Opus 4.5 entirely (`claude-opus-4-5-20251101`).
Curious: are you seeing the same patterns? Have the env vars helped? Anyone found other workarounds?
References: BridgeBench (bridgebench.ai/hallucination), GitHub Issue #42796