Open Weights Hit Frontier Coding: Agent Market Widens

Three new entrants in one week

MiniMax released M3 on May 31, the first open-weights model to combine frontier coding performance with a 1M token context window and native multimodality. xAI opened Grok Build to all SuperGrok subscribers. Mistral rebranded Le Chat into Vibe and shipped a VS Code coding extension. Three competitors, three different starting positions, same week.

If your team is evaluating coding agents, the field just got wider. The market is no longer a choice between Claude Code and Codex with everything else as a curiosity.

MiniMax M3: open weights that actually compete

MiniMax M3 scored 59.0% on SWE-Bench Pro. That is approaching Claude Opus 4.7's numbers and surpassing GPT-5.5 and Gemini 3.1 Pro on the same benchmark. It handles 1 million tokens of context through a new sparse attention architecture the company calls MSA. Text, images, and video are native inputs. The model is available now through the MiniMax API and a new MiniMax Code platform. Open weights and a technical report are coming in roughly ten days.

59% is not the top score on the board. But it is close enough that the gap stops deciding things for most teams. When an open-weights model sits within striking distance of the best closed-source options, the question shifts from "which model writes better code?" to "which model can I run where I need it, under the constraints I actually have?"

SWE-Bench Pro scores comparing Grok Build, Opus 4.7, MiniMax M3, GPT-5.5, and Gemini 3.1 Pro

Vercel added M3 to its AI Gateway within 24 hours. That speed of adoption tells you more than the benchmark number. When the infrastructure layer picks up a model that fast, developers are asking for it.

Grok Build: xAI goes terminal-native

xAI published the Grok Build CLI on May 25, opening it to all SuperGrok and X Premium Plus subscribers. The tool runs up to eight subagents in parallel, each isolated in its own Git worktree. It is built on grok-build-0.1A, a model xAI trained specifically for agentic coding, with a reported 70.8% on SWE-bench verified by early third-party writeups.

Eight parallel agents with worktree isolation is the most aggressive parallel execution design anyone in the category has shipped. Claude Code's dynamic workflows scale higher in raw agent count (up to 1,000), but Grok Build's default of parallel worktrees feels closer to how a senior engineer splits work: give each agent its own branch, let them not step on each other.

The catch: xAI's compliance and enterprise paperwork is thinner than what Anthropic, OpenAI, or Google can offer. Grok Build calls itself local-first, with source code and credentials staying on the machine, but local execution is not local inference. The model call still goes to xAI's servers. For teams in regulated industries, the marketing says the right things but the audit trail is not there yet.

xAI also has a captive distribution channel through X Premium Plus that none of the other players can match. Whether that translates into sustained developer adoption past week one is the open question.

Mistral Vibe: work agent meets code agent

Mistral rebranded Le Chat as Vibe and split it into two modes. Work Mode handles long-running multi-step tasks across enterprise connectors (Google Workspace, Outlook, Slack, SharePoint). Code Mode launches remote coding agents from a dedicated web surface. A new VS Code extension connects the coding agent to the developer's project inside the IDE.

The positioning is distinct from what Claude Code, Codex, Cursor, or Antigravity are doing. Mistral is not trying to win on terminal-native coding or IDE-first experience. It is building one agent that moves between work tasks and code tasks without switching surfaces. The agent that catches up on email, drafts the board deck, and then implements the feature you discussed in that email. Whether that breadth beats depth is an open question. It is a coherent product thesis, but Mistral's coding performance is not at the frontier. The company is candid about this. The pitch is about unified workflow, not benchmark scores, which is honest but may limit adoption among developers who pick tools based on code quality first.

Codex 0.136.0: the infrastructure buildout

OpenAI shipped Codex v0.136.0 on June 1 with session archiving, app-server integrations for resuming threads, a Python SDK in beta, and a batch of command-safety hardening fixes. None of these are flashy. Session archiving protects important sessions from accidental resume or fork operations. The Python SDK opens Codex to programmatic use beyond the TUI. The command-safety fixes close specific attack vectors around Git hooks, PowerShell parser execution, and browser-origin websocket handshakes.

This is the boring work that matters. The phase where Codex needed headline features to stay in the conversation is mostly over. The current cadence is about making the tool safe enough for teams that run it all day. The 4 million weekly developers OpenAI reported in late May give it the distribution advantage. The buildout is about retention, not acquisition.

Where this leaves you

The coding agent market looked like a two-horse race between Claude Code and Codex through Q1 2026. Cursor held the model-agnostic IDE niche. Antigravity was the Google Cloud bet. Everything else was interesting but not competitive.

That framing needs updating. MiniMax M3 puts open-weights coding performance in the same neighborhood as the closed-source leaders. Grok Build brings a credible terminal-native agent backed by xAI's model training and X's distribution. Mistral Vibe connects coding to enterprise workflows in a way none of the other tools attempt.

You have more viable options than you did two weeks ago. The differences between them are less about raw model quality and more about how each tool fits your existing workflow. Teams running Claude Code because it was the only terminal agent worth using can now evaluate Grok Build on architecture and pricing. Teams that wanted open-weights for compliance reasons have MiniMax M3 approaching frontier performance. Teams that wanted one agent for email and code have Mistral Vibe.

The convergence Janakiram MSV documented in TheNewStack on June 1, where all four major coding agents settled on the same blueprint, is real. But convergence at the blueprint level has not produced convergence at the product level. The tools differ on harness design, pricing structure, and workflow integration. The new entrants widen those differences.

The cost question shifted too. MiniMax is offering 50% off standard usage for the first week, and open weights mean no per-token API cost if you self-host. Grok Build ships with a SuperGrok subscription that includes model calls. Mistral bundles coding into its existing Vibe license. When the model stops being the differentiator, pricing and deployment flexibility become the decision axes.

If you are running Claude Code on a Max plan, the value proposition is still strong for large codebase work where deep reasoning and approval gates matter. If cost per accepted change is your primary metric, MiniMax M3 and Grok Build are worth testing this month. If you want one agent for operational work and code, Mistral Vibe is the only option trying that combination today.

Sources:

MiniMax M3 announcement (May 31, 2026)
xAI Grok Build CLI (May 25, 2026)
Mistral Vibe announcement (May 28, 2026)
OpenAI Codex v0.136.0 (June 1, 2026)
Claude Code vs Cursor vs Codex vs Antigravity (TheNewStack, June 1, 2026)
MiniMax M3 on Vercel AI Gateway (May 31, 2026)

Open weights hit frontier coding. The agent market just widened.