Anthropic Zero Trust, Cisco Attacks, DataGrail: AI Trust Broken

Three reports landed in the same 48-hour window this week. None of the authors coordinated. They didn't need to.

On May 27, Anthropic published "Zero Trust for AI Agents," a framework built on the assumption that your agents will be compromised. On May 28, Cisco released research showing that every frontier AI model breaks under sustained multi-turn attack, with success rates hitting 88% on the worst performer. Also on May 27, DataGrail published a study of 2,400 business software vendors finding that 63.6% of the ones advertising AI capabilities don't disclose their third-party AI subprocessors in legal documentation.

Three independent findings, one shared argument: you cannot trust AI agents with your existing security model, and the things you think are protecting you probably aren't.

What Anthropic actually said

Anthropic's framework applies Zero Trust principles to AI agents specifically. The document covers identities that are cryptographically rooted (not just API keys sitting in environment variables), permissions scoped per task rather than per agent, memory protected against poisoning, and defensive operations that run at the speed of autonomous attackers.

The framing is worth pausing on. One of the biggest AI companies in the world is saying, explicitly, that you should assume your agents will be breached from day one. Not "might be." Will be. The framework's three tiers (Foundation, Advanced, Optimized) map to organizational maturity, but the starting assumption is the same at every level.

What Cisco found when it actually tested

Cisco's AI threat intelligence team ran 30,000 single-turn prompts and 7,000 multi-turn attacks across 15 flagship models from OpenAI, Anthropic, Google, Amazon, and xAI.

Multi-turn attack success rate by AI model

Google's Gemini 3 Pro went from 18% attack success on single-turn to 73% under sustained multi-turn probing, a 55-point jump. OpenAI's GPT-5.4 rose roughly ninefold, from low single digits to nearly 25%. xAI's Grok 4.1 Fast hit 88% attack success in its default configuration. Anthropic's Claude family held up best on single-turn (low single digits) but still reached 11-16% under sustained attack.

More than half the models showed a gap of at least 15 points between single-turn and multi-turn performance. That matters because the safety benchmarks the industry relies on are almost all single-turn. The numbers on model cards and safety reports describe a testing regime that doesn't match how real attacks work.

Amy Chang, who leads AI threat research at Cisco, put it plainly: "Real adversaries won't stop at the first refusal; they will build additional context, reframe, or escalate across the conversation."

One detail worth flagging: Grok 4.1 Fast's multi-turn ASR dropped roughly in half when reasoning mode was turned on, a 40+ point swing from a single configuration flag. That kind of variability doesn't appear on any public benchmark or model card.

What your vendors aren't telling you

While Cisco proved the models themselves are fragile, DataGrail showed the supply chain around them is opaque.

Their researchers cross-referenced DPA disclosures (the legal documents governing how vendors handle your data) against product documentation, GitHub environments, API connections, and marketing materials for 2,400 business software vendors. Nearly two-thirds of vendors advertising AI capabilities don't disclose all their AI subprocessors.

Consider the scenario DataGrail CEO Daniel Barber described: A company buys an AI recruiting tool. The DPA lists Claude as the model. Security reviews Anthropic. But the tool also uses OpenAI and Gemini behind the scenes, models the buyer never evaluated, processing thousands of resumes containing home addresses, financial data, and Social Security numbers.

Shadow AI breach costs average $4.63 million according to IBM's 2025 data, $670,000 more than organizations with low shadow AI exposure. This lands in a year when U.S. states issued $3.425 billion in privacy fines, more than the previous five years combined.

Why these three converge

Anthropic is building the models. Cisco is testing them. DataGrail is auditing the supply chain around them. Same problem, three angles:

Model-level defenses aren't enough. Cisco proved that. Every model in the cohort broke under sustained probing. Anthropic's response is to stop relying on model-level defenses and build architectural controls instead.

You can't trust your vendor contracts. DataGrail proved that. If two-thirds of vendors are hiding AI subprocessors, your DPAs are incomplete and your risk assessments are built on partial information.

Zero Trust is the answer, but it looks different for agents. Anthropic is saying so from the vendor side. Cryptographic identities, task-scoped permissions, memory safeguards, Agentic SOAR operations. These are architectural requirements, not theoretical suggestions.

Also worth noting: Adversa AI published details of the "SymJack" attack on May 27, showing how AI coding agents can be turned into supply chain attack delivery systems through malicious symlinks. The developer approves a routine file copy, and the agent silently registers a malicious MCP server that steals SSH keys, cloud tokens, and browser sessions. That's four independent security disclosures in 48 hours.

Practical steps

If you're running AI agents in production (or about to):

Audit your vendor AI subprocessors. Pull every DPA and cross-reference against product documentation and API connections. Assume the DPA is incomplete.

Test your model stack under multi-turn conditions. Cisco's research shows single-turn safety scores don't reflect real attack conditions. Demand multi-turn ASR disclosure from your model provider.

Start building for Zero Trust at the agent layer. Anthropic's framework is free and works regardless of whose models you use. The eight-phase implementation workflow covers identity, access scoping, sandboxing, input/output controls, and memory safeguards.

Anthropic Wants Zero Trust. Cisco Proves Why. Your Vendors Hide the Rest.

What Anthropic actually said

What Cisco found when it actually tested

What your vendors aren't telling you

Why these three converge

Practical steps

Sources

Like this kind of writing?