Skip to main content
Developer Tools & Claude CodeJun 1, 2026 · 5 min read

Coding Agents Don't Just Assist Anymore. They Orchestrate.

Claude Code and OpenAI Codex both shipped multi-agent systems in 48 hours. What changed, what it costs, and what it means for teams evaluating AI tooling.

By SpringVanta

Two things landed this week that I did not expect to land at the same time.

On May 28, Anthropic shipped dynamic workflows in Claude Code alongside Opus 4.8. Two days later, OpenAI released a massive Codex update that adds computer use, persistent memory, and 90+ plugins to what used to be a terminal coding assistant. Neither company coordinated. Both made the same bet: the coding agent is no longer a pair programmer. It is an engineering team.

What Claude Code dynamic workflows actually do

The headline is "hundreds of subagents in parallel," but the real change is where the plan lives.

Before dynamic workflows, Claude Code handled multi-step tasks by dispatching subagents turn by turn. Every intermediate result (every file read, every test output) accumulated in Claude's context window. The context window was the ceiling. Once it filled up, the task choked.

Dynamic workflows move the plan into code. Claude writes a JavaScript orchestration script from your natural-language prompt. A separate runtime executes that script in the background. The script holds the loops, the branching, the intermediate results in its own variables. Claude's context window only ever sees the final, verified answer.

How Claude Code Dynamic Workflows work: orchestration loop diagram

The limits: up to 16 agents running concurrently, 1,000 total agents per run. The workflow script itself cannot touch the filesystem or shell. Only the spawned subagents read, write, and execute commands. Progress saves as it goes, so an interrupted run picks up where it left off.

There are two ways to trigger it. You can ask Claude to "create a workflow" explicitly. Or you can turn on a new setting called ultracode, which sets effort to the maximum level and lets Claude decide on its own when a task needs a workflow. Both are available in Claude Code v2.1.154 and later, on Max, Team, and Enterprise plans.

The Bun port: why this is not a demo

Jarred Sumner, the creator of the Bun JavaScript runtime (which Anthropic acquired in December 2025), used dynamic workflows to port Bun from Zig to Rust. The numbers: roughly 750,000 lines of new Rust code, 2,188 files touched, 99.8% of the existing test suite passing on Linux x64. First commit to merge in eleven days.

The process was deliberately mechanical. One workflow mapped Rust lifetimes for every struct field. Another spun up hundreds of agents to write the .rs files, with two reviewer agents per file. A final fix loop ran the build and test suite until everything passed.

This was not idiomatic Rust. Sumner has said publicly that the goal was a working port that passes the existing tests, not a beautiful rewrite. But 750K lines of working code in 11 days is a data point that changes what people think is possible with agent-assisted engineering.

Codex: the same week, the opposite direction

While Anthropic was building orchestration depth (more agents coordinating on one task), OpenAI was building breadth. The May 30 Codex update is less about agent coordination and more about turning a coding assistant into a full developer workstation.

The headline feature is background computer use. Codex can now see your screen, move your cursor, click, and type inside macOS apps while you work in other windows. Multiple agents run in parallel without blocking each other. As of v0.135.0, remote control also supports Windows devices. You can start a Codex job on a Windows machine from ChatGPT on your phone and monitor it remotely.

Other additions: an in-app browser for viewing local dev servers and leaving visual comments, inline image generation with gpt-image-2, persistent memory across sessions, and 90+ plugins. The CLI got Goal Mode, which lets you describe an outcome and Codex works toward it autonomously for hours. The user count: 3 million weekly developer users, with ChatGPT Business/Enterprise Codex usage growing 6x between January and April 2026.

The framing from Codex lead Thibault Sottiaux in an April media briefing: they are "building the super app out in the open and evolving it out of Codex."

Why these two announcements in the same week matter

Anthropic and OpenAI are approaching the same problem from opposite directions. Anthropic is going deep: one model, many coordinated agents, a single verified answer. OpenAI is going wide: one agent, many capabilities (screen, browser, images, plugins, memory), a single workstation that handles everything.

Both have real costs. Anthropic warns that dynamic workflows "can consume substantially more tokens than a typical Claude Code session." One early tester reported that a single workflow spin-up burned through an order of magnitude more tokens than a normal session. Codex's breadth comes with complexity: 90+ plugins means 90+ surfaces to secure, and computer use means giving an AI agent direct access to your desktop.

For teams evaluating these tools right now, the question is not which one is better. It is how much autonomy you are comfortable giving a coding agent, and on what kind of task. Claude Code dynamic workflows are strongest for large, well-defined projects (migrations, audits, codebase-wide refactors) where you can describe the scope precisely. Codex with computer use is strongest for open-ended development where the agent needs to interact with graphical tools, browsers, and multiple applications.

What this means if you are not a developer

If you run a business that is evaluating AI automation, these updates are worth paying attention to even if you never open a terminal. The pattern is the same one playing out across voice AI, CRM, and intake automation: the tool is becoming an agent, and the agent is becoming a team.

Six months ago, coding agents were autocomplete on steroids. Now Claude Code can spin up hundreds of agents to rewrite a runtime in 11 days, and Codex can operate your desktop while you watch. The gap between what these tools could do last quarter and what they can do this quarter is wider than any single model upgrade would suggest.

The practical implication: if you are building or buying AI tools for your business, evaluate the orchestration layer, not just the model. How does the tool handle multi-step tasks? How does it verify its own output? How does it recover from errors? The models will keep getting better. The orchestration is where the real reliability gains are coming from.

Sources

Read more

Like this kind of writing?

One email when something good ships — usually once or twice a month.