Six AI Agent Flaws in 48 Hours Share One Root Cause
Five CVEs and one behavioral exploit hit the AI agent ecosystem in 48 hours. They all share the same broken trust boundary: instructions are treated as data but executed as actions.
By SpringVanta
Five independent security disclosures hit the AI agent ecosystem between June 23 and June 25, 2026. They look unrelated on the surface: a malicious skill that bypassed three security scanners, a cross-tenant data leak in a platform used by Volvo and Maersk, a CVSS 9.9 command injection in an MCP server, a puzzle game that tricked six AI browsers into handing over credentials, and a critical RCE exploited within 20 hours of going public.
They share one root cause. Every one of these platforms treats instructions the agent reads as passive data. But when an AI agent encounters those instructions, it acts on them. The trust boundary between reading and doing does not exist in any of these systems.
If your organization is deploying AI agents for intake, customer support, or internal automation, your security controls were built for a world where humans initiate actions. They have no frame of reference for an agent that follows a poisoned instruction set autonomously.

A fake skill that passed every scanner
Security research firm AIR published an experiment on June 24 showing that a malicious AI agent skill reached over 26,000 users after passing security scanners from Cisco, Nvidia, and skills.sh. The skill, called "brand-landingpage," was designed to help users build landing pages with Google's Stitch design tool. It appeared clean because the malicious payload was not inside the skill package. Instead, the skill pointed agents to an external URL (stitch-design.ai, controlled by AIR) that initially redirected to the real Google domain. Once the skill had distribution, AIR changed the content behind that URL to instruct agents to download and run a script.
The technique exploited a gap that no current scanner addresses. Static analysis examines the skill's packaged files at the time of approval. But a skill can reference external resources that change after trust has been granted. As researcher Devashri Datta told CSO Online: "Treating agent skills as mere text or prompts is a fundamental architectural misunderstanding. They are executable instruction bundles that dictate how an agent operates, interacts with enterprise systems, and routes data."
Some of the 26,000 agents that installed the skill were tied to corporate accounts. AIR's test payload collected only email addresses, but the same mechanism could have exposed private conversations and internal systems.
Dify: cross-tenant data exposure in production AI infrastructure
On the same day, Zafran Security disclosed four vulnerabilities in Dify, an open-source platform for building AI applications with over 140,000 GitHub stars and more than one million applications. Three of the flaws could expose data across tenants in Dify's cloud service. Companies using the platform include Volvo, Maersk, Panasonic, and Thermo Fisher.
The most significant issue, CVE-2026-41947, involved Dify's tracing functions. The endpoints did not validate the sender's tenant, meaning anyone with a free Dify console account could configure tracing for applications belonging to other organizations. That access created a channel for collecting private AI conversations and model responses from applications across the platform.
A second critical flaw, CVE-2026-41948, exploited the Plugin Daemon through path traversal. The GET route required no login and accepted a tenant ID from the user, opening internal endpoints to anyone. Two additional vulnerabilities (CVE-2026-41949 and CVE-2026-41950) allowed cross-tenant file preview and file attachment manipulation.
Zafran noted that Dify had been running a version of PDFium vulnerable to CVE-2024-5846 for more than 18 months after the flaw became public. That left systems open to use-after-free attacks through uploaded PDFs. Three of the four CVEs are patched in version 1.14.2; the Plugin Daemon fix is pending the next release.
Flowise: CVSS 9.9 command injection through MCP server config
Also on June 24, a critical vulnerability was disclosed in Flowise, another open-source low-code platform for building LLM application flows and AI agents. CVE-2026-56274 carries a CVSS score of 9.9 and enables OS command injection through the Custom MCP Server feature.
Any Flowise user, regardless of role, could exploit the flaw. An attacker could inject arbitrary operating system commands through crafted MCP server configurations, leading to full remote code execution on the host. The vulnerability resulted from incomplete command-flag validation and a regex bypass in local file access restrictions. The fix landed in version 3.1.2.
This was the fourth AI framework vulnerability disclosed in this period, following critical issues in Mastra, LiteLLM, and AutoGen Studio. The pattern across these tools points to the same problem: MCP and tool-integration features are being shipped without the input validation and sandboxing that production security requires.
BioShocking: six AI browsers, six failures
On June 25, browser security firm LayerX published research showing that six AI browser products can be manipulated into leaking user credentials through what the firm calls a "BioShocking" attack. The technique uses a puzzle game to gradually erode an AI agent's default reasoning.
The first challenge rewarded wrong answers, prompting users to enter "2 + 2 = 5" instead of 4. The AI initially refused. After repeated failures under the game's inverted logic, it adapted and accepted the alternate rules. Once the agent operated within the manipulated framework, the game directed it to visit a separate page and copy text from a form field containing login credentials. The agent returned the credentials as a puzzle solution, treating credential extraction as gameplay.
LayerX tested the attack against ChatGPT Atlas (OpenAI), Comet (Perplexity), the Claude Chrome Extension (Anthropic), Fellou, Genspark Browser, and Sigma Browser. Every product failed. OpenAI patched ChatGPT Atlas after disclosure. Perplexity closed the report without fixing it. Anthropic attempted a mitigation that did not block the attack.
The findings expose a threat model that traditional browser security was never designed to address. Attackers are not exploiting software vulnerabilities. They are manipulating the AI's decision-making process directly, convincing it to authorize harmful actions voluntarily.
Langflow: RCE exploited in 20 hours
Also on June 25, the Sysdig Threat Research Team reported that CVE-2026-33017, a critical unauthenticated RCE in Langflow, was exploited within 20 hours of disclosure. Langflow is an open-source visual framework for building AI agents and RAG pipelines with over 145,000 GitHub stars.
The vulnerability lives in the POST /api/v1/build_public_tmp/{flow_id}/flow endpoint, which accepts attacker-supplied flow data containing arbitrary Python code in node definitions. That code executes server-side without sandboxing, using a single HTTP request with no credentials required.
Sysdig deployed honeypots immediately after the advisory. The first exploitation attempt arrived at 16:04 UTC, roughly 20 hours after disclosure, with zero public proof-of-concept code in existence. Over 48 hours, they recorded attacks from six unique source IPs across three phases: automated Nuclei scans, custom Python scripts performing credential harvesting, and an advanced attacker who dumped full environment variables and extracted database credentials, cloud API keys, and service tokens.
The median organizational patch time is approximately 20 days. Attackers moved in 20 hours. That gap is the operational reality for every team running AI infrastructure.
Microsoft Copilot SearchLeak: the AI assistant as exfiltration tool
On June 23, researchers disclosed CVE-2026-42824, a CVSS 9.1 vulnerability chain in Microsoft 365 Copilot that allows an attacker to extract documents, emails, and Teams messages from a target organization through a crafted link. The chain combines prompt injection, an HTML rendering race condition, and a Bing SSRF-based CSP bypass.
Because the exfiltration channel runs through Microsoft's own Bing infrastructure, standard data loss prevention tools, network proxies, and CASB solutions do not flag the outbound traffic. It looks like legitimate Copilot telemetry. Every Microsoft 365 tenant with Copilot enabled was potentially exposed. Microsoft has patched the vulnerability.
Kiteworks, which reported the finding, noted that this is the first high-severity CVE validating concerns about AI data governance at scale. The tools organizations rely on to prevent data loss were built for a world where humans initiate file transfers. They have no frame of reference for an AI assistant silently retrieving and encoding files in response to an instruction buried inside a webpage.
The shared root cause
Strip away the specific CVEs and vendor names, and every one of these disclosures exploits the same assumption: that content an AI agent reads can be treated as passive data. It cannot.
Langflow's public endpoint accepted flow data and executed it as Python. Flowise's MCP server configuration accepted input and ran it as OS commands. Dify's plugin system shared internal APIs across tenant boundaries. AIR's malicious skill pointed agents to a URL whose content could change after approval. LayerX's puzzle game rewrote the agent's reasoning framework until it voluntarily handed over credentials. Copilot's rendering layer turned a link into a data exfiltration channel.
The architectural pattern is the same in every case. An agent encounters instructions from an external source (a webpage, a skill package, a tool configuration, an error report, a chat message). Those instructions become actions because the platform does not enforce a boundary between what the agent reads and what the agent does. Prompt-layer warnings do not help. AIR's research and the agentjacking research from Tenet Threat Labs both confirmed that agents execute injected payloads even when their system prompts explicitly tell them to disregard untrusted external data.
What to do about it
If you are deploying AI agents, audit each of these surfaces against your own stack.
Runtime controls over static scanning. A skill or plugin that passes a security scan at install time can change behavior later. Treat agent extensions as living dependencies. Require version pinning, immutable reference tracking, and runtime network restrictions that limit which domains an agent can call.
Tenant isolation that survives the plugin layer. If your AI platform shares services across tenants (tracing, file handling, plugin daemons), verify that tenant boundaries are enforced at every endpoint, not just at the API gateway. Zafran's Dify findings show how quickly isolation breaks when internal services trust caller-supplied tenant IDs.
Sandbox everything that executes. Langflow and Flowise both executed user-supplied code without isolation. If your AI infrastructure includes a code execution layer (flow builders, MCP servers, custom tools), it needs sandboxing. No exceptions.
Assume the reasoning layer is the attack surface. LayerX proved that you can manipulate an agent's decision-making through its context window alone. Restrict what data and services your AI agents can access. Do not grant autonomous permissions to tools that can reach authentication tokens, corporate repositories, or internal systems without human review.
Patch faster than 20 days. The median organizational patch time is 20 days. Langflow was exploited in 20 hours. Maintain an inventory of your AI infrastructure components and subscribe to their security advisories. If you cannot identify every instance of Langflow, Dify, or Flowise in your environment, you cannot patch them.
The five disclosures landed in a 48-hour window, but the underlying problem has been there since the first agent platform shipped. The gap between what security teams think they are protecting and what AI agents actually expose is not closing. It is getting wider every time a new tool adds an MCP server, a skill marketplace, or an autonomous browsing feature without the controls to match.
Sources: CSO Online, IT Brief, SecurityBrief Canada, CyberPress, TechDemis, Threat-Modeling.com, Kiteworks, Zafran Security, Sysdig TRT, LayerX, AIR.