Claude Code Gets Fallback Models While the MCP Ecosystem Tackles Context Waste

Three releases, one open-source project, and a production writeup

Claude Code shipped v2.1.166 on June 5 with fallback model support and a security hardening pass on cross-session messaging. An open-source project called context-mode demonstrated a way to cut agent context usage by 98% through sandboxed tool output. And Anthropic's self-hosted sandbox for Managed Agents started getting real production writeups from teams running agents inside their own network perimeter.

None of these are directly related. But together they point at the same pressure point: coding agents are running into the limits of their context windows, and the infrastructure around them is adjusting.

Fallback models and security fixes in v2.1.166

The headliner in v2.1.166 is fallbackModel, a new setting that lets you configure up to three fallback models tried in order when the primary model is overloaded or unavailable. Before this, a model outage meant your coding session stopped cold. Now you can set Sonnet as primary, Haiku as first fallback, and a third-party model as last resort.

The --fallback-model flag also now works in interactive sessions, not just headless mode. Claude Code retries a turn once on the fallback model when the API hits an unexpected non-retryable error. Auth errors, rate limits, and request-size errors still surface immediately, which is the right call. You want to know about rate limits.

The security fix is specific and practical. Cross-session messages relayed via SendMessage from other Claude sessions no longer carry user authority. Receivers refuse relayed permission requests, and auto mode blocks them entirely. Before this fix, one agent session could potentially leverage another session's permissions through a relayed message.

Other fixes in v2.1.166:

Glob pattern support in deny rules (* denies all tools)
Thinking can now be disabled on models that think by default via MAX_THINKING_TOKENS=0 or --thinking disabled
claude update now shows the target version before downloading
Fixed remote sessions permanently stuck when a backend disruption hits during worker registration
Fixed JetBrains IDE terminal flickering on 2026.1+
Fixed orphaned claude --bg-pty-host processes spinning at 100% CPU after daemon death on macOS
Fixed background agent sessions in git worktrees crash-looping with "No conversation found"

v2.1.167 and v2.1.168 followed as bug-fix releases. Three releases in 24 hours (June 5-6) suggests the v2.1.166 feature drop introduced regressions that needed immediate patching.

The context problem

Your context window is the most expensive resource you have when running coding agents, and most of it is spent on raw tool output that the model never needed to see.

When Claude Code runs a bash command and gets 300 lines of output, all 300 go into context. When it reads a file, the entire file goes in. When it searches a codebase, the full grep results land in context. Most of this is noise the model skims past.

inline: context optimization flow diagram

A project called context-mode caught attention in MCP circles this week for tackling exactly this. It is an open-source MCP server that intercepts tool output before it reaches the agent, compresses it, and stores the raw version in a SQLite database with FTS5/BM25 indexing. The agent gets a compressed summary instead of the full output, and can retrieve specific details from storage when it actually needs them.

The claimed compression is aggressive: 315KB of raw tool output compressed to 5.4KB, a 98% reduction. The mechanism works by running tools in a sandbox that captures output, storing it, and sending only a processed summary to the agent's context window.

Whether the 98% figure holds in real workloads is an open question. But the direction is right. Context window waste is the main cost driver for anyone running agents on tasks longer than a quick fix.

Self-hosted sandboxes in practice

The third thread: Anthropic's self-hosted sandbox for Managed Agents, announced at Code with Claude London on May 19, is accumulating production patterns. The feature lets teams run Claude agent tool execution inside their own infrastructure while keeping orchestration on Anthropic's side.

A detailed writeup from Digital Applied documented seven deployment patterns with honest maturity ratings. The detail most launch coverage skipped: Anthropic keeps the agent loop (context management, error handling, orchestration) on its own servers. What moves to your infrastructure is the tool execution layer. That split works for compliance teams who need data locality. It also means you are still dependent on Anthropic's control plane being available.

The Cloudflare integration is the easiest path: Workers-based sandbox, Browser Run for agent-driven browsers, and agent inboxes for async communication. For teams already on Cloudflare, setup takes a few minutes. For everyone else, the self-hosted path requires container orchestration and network configuration that the documentation does not fully cover yet.

What changed this week

If you run Claude Code in CI, configure fallback models. A single-model dependency is a production risk that v2.1.166 addresses directly.

Context optimization tooling is moving. The context-mode MCP server is one approach. Anthropic's own memory stores (mounted as filesystem directories at /mnt/memory/) are another. Both attack the same problem from different angles.

Self-hosted tool execution is real for compliance use cases. Full agent sovereignty is not here yet. The orchestration layer still lives with Anthropic, and that dependency does not go away.

Sources

Claude Code v2.1.166 release notes
context-mode MCP server (agentry.press coverage)
Anthropic self-hosted sandbox: 7 production patterns
Anthropic adds sandbox, MCP tunnel features (Yahoo Tech)
Claude Managed Agents on Cloudflare