Skip to main content
AI Agent OperationsMay 27, 2026 · 5 min read

Coding agents got cheap enough for every team last week

Cursor Composer 2.5 matches frontier models at 1/10th the cost. Spotify's engineers code exclusively through AI. Codex now controls entire desktops. Three signals, one conclusion.

By SpringVanta

Three numbers landed in the same seven-day window, and together they change the math for any team thinking about AI coding tools.

Cursor shipped Composer 2.5 on May 18. It scores 79.8% on SWE-Bench Multilingual, matching Claude Opus 4.7 and GPT-5.5. The cost: $0.50 per million input tokens and $2.50 per million output tokens at standard tier. Roughly one-tenth of what Anthropic and OpenAI charge for their frontier models.

That same week, Spotify's chief architect took the stage at Anthropic's developer conference. He said 99% of Spotify's engineers now use AI coding tools voluntarily. The company's internal platform, Honk, built on Claude Code and the Claude Agent SDK, has merged over 1,500 pull requests. Senior engineers at Spotify have not written a line of code manually since December 2025.

Then on May 24, Tech Times reported that OpenAI Codex has crossed from sandboxed code runner to full desktop agent. The tool can now control Mac applications with its own cursor, capture screenshots to build ambient memory, and run tasks on a schedule. It ships with 90+ plugins for tools like Jira, CircleCI, and Microsoft Suite. Codex is no longer a coding tool. It is a workstation-level agent.

Any one of these would be a product update. All three in one week is something else.

SWE-Bench Multilingual benchmark comparison showing Composer 2.5 matching frontier models at fraction of cost

The cost crossover

The Composer 2.5 benchmark story has a wrinkle worth knowing. The model is built on Kimi K2.5, an open-source model from Moonshot AI, a Beijing-based lab. Cursor initially did not disclose the base model. After community pressure, co-founder Aman Sanger admitted it was "a miss to not mention the Kimi base in our blog from the start."

The disclosure matters less than the result. Cursor's training stack, sharded Muon optimizer, 25x more synthetic training tasks than Composer 2, targeted RL feedback at specific failure points in long coding sessions, turned an open-source base into a model that trades blows with the most expensive proprietary models on the market.

For teams running long agent sessions, the math is concrete. A 30-minute multi-file refactor that costs $30 on Opus 4.7 costs roughly $3 on Composer 2.5 at standard tier pricing. Cursor also offers a "Fast" tier at $3/$15 per million tokens, where the savings narrow but still favor Composer over Anthropic and OpenAI.

There is a catch. Composer 2.5 only runs inside Cursor. You cannot call it from your own infrastructure or wire it into a custom agent pipeline. If your agent workflow lives outside the Cursor IDE, you are still paying frontier prices.

What Spotify actually proved

The Spotify number that matters is not the "99% adoption" headline. It is the 1,500 merged PRs on an internal platform built with Claude Code and the Claude Agent SDK.

That is production proof, not survey data. Spotify's Honk platform uses MCP-based connectors and fleet management infrastructure to run coding agents at scale across hundreds of engineers. The constraint, according to Spotify's chief architect, has moved from writing code to orchestrating and reviewing it.

For a company shipping 4,500 deployments per day, the shift is not a pilot program. It is operational. The engineers did not stop thinking. They stopped typing.

The original revelation came during Spotify's Q4 2025 earnings call in February 2026, when co-CEO Gustav Soderstrom told analysts that his most senior engineers "have not written a single line of code since December." The Anthropic stage appearance in May 2026 added the Honk platform details and the 99% voluntary adoption figure.

Codex left the sandbox

The OpenAI Codex update gets less attention than the model benchmarks, but it may matter more for businesses that do not employ software engineers.

Computer use means Codex can now click through any application on your Mac, not just tools with APIs. If your CRM lacks an API endpoint for "update this contact's status," Codex can open the CRM, find the contact, and change the status by clicking the same buttons a human would. It can do this in the background while you keep working.

Chronicle, the screen-capture memory feature, draws independent security commentary. Codex periodically captures screenshots, extracts text via OCR, and summarizes them into Markdown memories stored on the user's device. When you later ask Codex to "continue what I was working on yesterday," it reads those stored memories to resolve the reference. Security researchers have raised questions about what happens to those screenshots and how the memory files are protected.

The plugin catalog, 90+ tools including Jira, GitLab Issues, and Microsoft Suite, means Codex can pull context from your existing stack without custom integration work. This is the MCP standard in practice: connect once, use everywhere.

What this means for smaller teams

If you run a 10-person company and you are evaluating whether AI coding tools are worth the subscription, last week gave you data points that were not available before.

The price floor dropped by roughly 10x. Composer 2.5 is not available outside Cursor, but it puts pricing pressure on every other provider. When a specialized model matches frontier benchmarks at a fraction of the cost, the frontier providers have to respond.

The trust question has a real answer now. Spotify is not a startup running a pilot. It is a public company shipping thousands of deployments daily, and its senior engineers have been working exclusively through AI agents for five months. The code gets reviewed. The PRs get merged. The system runs.

The scope of what AI agents can touch just expanded from code in a sandbox to anything on your desktop. If your workflow involves clicking through tools that lack APIs, the agent ecosystem now has an answer for that.

The practical move for most teams: start with the cheapest option that covers your actual workload. For code-heavy work inside an IDE, Cursor Pro at $20/month with Composer 2.5 is hard to beat on price. For terminal-first or multi-tool workflows, Claude Code with Opus 4.7 costs more but runs anywhere. For teams that want agents to operate existing software without writing integration code, Codex desktop with computer use is the new option.

Sources: Cursor Composer 2.5 announcement, TechCrunch on Spotify AI adoption, Tech Times on Codex desktop agent, Turion.ai economic analysis, ChatForest benchmark comparison, The Decoder on Kimi K2.5 disclosure

Read more

Like this kind of writing?

One email when something good ships — usually once or twice a month.