210K Daily Anomalies, 99% Untested, 89% Can't Answer: The Week AI Agent Governance Got Real Data
Codenotary's 210K daily anomalies, Drata's 89% governance gap, and Microsoft's ASSERT framework ship in the same 72 hours, mapping the AI agent governance stack.
By Springvanta
210,000 times a day, an AI agent does something inside an enterprise that it shouldn't.
That number comes from Codenotary's AgentMon platform, which crossed 3 million monitored AI-agent interactions per day this week across enterprise environments. About 7% of those interactions triggered security, compliance, or operational anomaly flags. That is not a rounding error. It is 210,000 potentially unsafe events every 24 hours, and here is the uncomfortable part: most of them were not attacks. The agents were just doing things wrong inside legitimate workflows.
Three things happened in the same 72-hour window that tell you where the AI agent governance market actually is. Codenotary released the first large-scale runtime telemetry data from production environments. Drata launched an AI Agent Governance product built around the five questions every enterprise is suddenly being asked by auditors and customers. And Microsoft open-sourced ASSERT, a testing framework born from the observation that almost nobody tests agents before putting them in production.
Each one addresses a different layer of the same problem. Observe what agents do at runtime. Control what they are allowed to do. And test them before you let them loose.

The first real runtime data
Codenotary's numbers are worth sitting with. The majority of the 210,000 daily anomalies were not malware, not external attacks, not someone breaking in. They were unsafe or unexpected behavior from agents operating inside normal enterprise workflows: exposing passwords and API tokens, making unauthorized tool calls, accessing data outside policy boundaries.
Dan Twing, president and COO at Enterprise Management Associates, put it plainly: "The challenge with autonomous systems is not simply whether they execute. It is whether they interpret state correctly, operate within established guardrails, and produce the intended outcome."
Traditional security platforms watch endpoints, networks, identities, and applications. They were never designed to observe autonomous software that reasons, plans, invokes tools, and makes decisions across multiple systems in a single workflow. Codenotary's data suggests that when you actually instrument that layer, you find agents misbehaving at a rate that would be unacceptable for any other class of enterprise software.
Moshe Bar, Codenotary's CEO, described AI runtime behavior as "a new operational and security layer that enterprises must continuously monitor, govern, and enforce." Six months ago I would have read that as vendor positioning. After seeing the 7% anomaly rate from production data, I think he is describing something real.
Drata and the five questions you cannot answer
On the same day Codenotary released its numbers, Drata launched AI Agent Governance. Drata is not a startup guessing at the market. Their trust platform is used by 8,500 organizations. Over the past nine months, they processed more than 2.1 million security questions through their Trust Graph. The frequency of AI-specific questions surged by over 30% in that period.
Those questions cluster around five themes:
- Which AI agents are running?
- What are they allowed to do?
- Who do they run as?
- Are they behaving as expected?
- Can you prove all of the above?
Drata's own data shows that 89% of companies leave those questions unanswered. Not partially answered. Unanswered. McKinsey separately found that 57% of business leaders cite governance friction as the top blocker to deploying more AI.
Nils Puhlmann, co-founder of the Cloud Security Alliance and former CSO of Twilio and Zynga, described the shift in enterprise security reviews: "In the past, the conversation centered on which frameworks we were certified against and what our third-party risk profile looked like. Over the past few months, an entirely new category of questions has emerged, focused on which AI agents are running and how they are governed. Answering those questions confidently is impossible with today's technology."
Drata's product tries to close that gap with inline sensors that discover every agent in the environment, real-time policy enforcement that blocks violations before execution, and tamper-evident audit logs. It is in early access with customers in financial services, healthcare, and software.
Adam Markowitz, Drata's CEO, drew a direct parallel: "Where endpoint created CrowdStrike and cloud created Wiz, AI agents are creating a technology wave that requires a security layer." The CrowdStrike comparison is interesting because that market also looked premature to a lot of people right up until it didn't.
Nobody tests. Microsoft wants to fix that.
The third data point in this cluster is the most damning. On June 11, Microsoft open-sourced ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), a framework that converts natural-language requirements into executable tests for AI agents.
The backdrop: Gartner analyst Anushree Verma told InfoWorld that 99% of organizations do not evaluate any AI agents pre-production. Not some. Not most. Nearly all of them.
Microsoft's framing was blunt: "Agents fail in ways that are hard to see. They drift from policy, produce unsafe outputs in edge cases, and behave differently in production than they did in testing. Generic benchmarks do not catch these failures because they are not built around your policies, your agent, or your use case."
ASSERT generates evaluation scenarios, datasets, and scorecards from written specifications and governance documents, rather than requiring developers to manually create test suites. Microsoft said internal validation showed model-generated evaluations agreeing with human reviewers 80-90% of the time.
Forrester analyst Biswajeet Mahapatra noted that behavioral evaluation is "inconsistently applied rather than treated as a formal production gate" at most organizations. He cautioned that even 80-90% agreement with human reviewers is "not sufficient as a standalone control for governance or compliance." That is worth remembering. Automated testing helps. It does not replace judgment.
Gartner projects that by 2029, more than 75% of domain-specific agents built without agentic simulation in regulated industries will fail to deliver value. That is a long-range forecast, but it rhymes with what Codenotary is observing today: when you actually watch what agents do in production, a surprising share of it is wrong.
Gartner's 40% prediction
The timing also lines up with a Gartner prediction that surfaced this week via IndyKite: by 2027, 40% of enterprises will demote or decommission autonomous AI agents due to governance failures discovered only after production incidents.
That number reads differently once you have seen Codenotary's 7% daily anomaly rate. The agents are already misbehaving. The governance is not keeping up. Gartner's prediction is not speculative so much as a description of where the current trajectory leads.
What to actually do
If you are running AI agents in production, or about to, a few things got clearer this week.
Instrument before you scale. Codenotary's data makes a strong case that you will find problems you did not know existed once you actually observe agent behavior at runtime. The 7% anomaly rate is not theoretical. It is from production environments.
Expect to be asked. Drata's 30% surge in AI-specific security questions means your customers, auditors, and board are going to ask what agents are running, what they can do, and whether you can prove it. If you are in that 89% that cannot answer, now is the time to fix it.
Test before you ship. The 99% no-evaluation statistic from Gartner is staggering. Microsoft's ASSERT gives you a free, open-source starting point. There is no good reason to deploy an agent into production without running it through tests derived from your own policies.
The governance layer for AI agents has production data now (Codenotary), shipping products (Drata), and open-source tooling (Microsoft ASSERT). The question is no longer whether you need this. It is how fast you can put it in place.
Sources:
- Codenotary AgentMon 3M interactions announcement (CIO Influence, June 11, 2026): cioinfluence.com
- Drata AI Agent Governance launch (Help Net Security, June 10, 2026): helpnetsecurity.com
- Microsoft ASSERT open-source (InfoWorld, June 11, 2026): infoworld.com
- Gartner governance reckoning analysis (IndyKite, June 11, 2026): indykite.ai