$128M Into Voice AI Trust: The Reliability Bottleneck

$128 million went into voice AI reliability in 72 hours. Scaled Cognition raised $100 million. Coval raised $28 million. TELUS and ElevenLabs published production data showing what happens when you get reliability right. None of them are building chatbots. They are building the infrastructure that decides whether an AI agent can talk to your customers without lying to them.

The shift happening right now is not about whether voice AI sounds natural. That problem is mostly solved. The shift is about whether you can trust a voice agent with a bank balance, a prescription refill, or a service cancellation. Three companies in the same window bet on three different answers to the same question, and the question is not "can it talk?" but "can you believe what it says?"

Three trust layers: Scaled Cognition model layer, Coval testing layer, TELUS deployment proof layer

Scaled Cognition: architecting for zero hallucination

Scaled Cognition raised a $100 million Series A on June 25, led by Vinod Khosla at Khosla Ventures, with Genesys participating as a strategic investor. The round values the company at $750 million. Their pitch is direct: general-purpose LLMs are probabilistic, designed to produce what sounds plausible, not what is correct. In customer service, those two things diverge constantly.

CEO Dan Roth was previously CVP of Conversational AI at Microsoft. CTO Dan Klein leads the Berkeley NLP Group. Before Scaled Cognition, they built Semantic Machines, one of the first agentic AI companies, which Microsoft acquired. They spent years inside Microsoft applying AI to enterprise workloads and kept hitting the same wall: interactions that tested perfectly would later reveal what Roth calls "grievous errors, systematic ones, hiding behind convincing responses."

Their answer is a model called APT. It does not generate new data in response to user requests. It retrieves existing records. The company calls this "super-reliable intelligence" and claims it eliminates large classes of hallucinations by architecture rather than guardrails bolted on after the fact. CTO Dan Klein framed the core problem this way: "The biggest reliability challenge isn't the mistakes that look wrong. It's the ones that look completely correct."

That last point matters more than the funding amount. Roth says that when Scaled Cognition goes into large enterprises, they consistently find the actual hallucination rate is five times what the enterprise thinks it is. The model always sounds confident, so nobody catches the errors until they compound. Roth gave a concrete example: an AI takes a prescription refill request, sends a hallucinated prescription ID to the pharmacy, and the customer picks up the wrong drug. The model was confident the whole time.

Scaled Cognition launched commercially in February and says about 100 large enterprises, including Fortune 500 companies in financial services, healthcare, telecom, and insurance, are building on the platform. The company expects to automate over a billion customer service interactions in the next year. The broader market they are targeting is the $600 billion annual BPO industry, and their thesis is that when intelligence becomes software, the labor arbitrage logic of outsourcing disappears.

Coval: testing voice agents like Waymo tested self-driving cars

If Scaled Cognition is solving reliability at the model level, Coval is solving it at the infrastructure level. The company raised $28 million in a Series A led by Norwest on June 24, with participation from Base10 Partners, Twilio Ventures, and Y Combinator. Total funding is $31 million since 2024.

Founder Brooke Hopkins built evaluation infrastructure at Waymo, Alphabet's self-driving unit. At Waymo, the question was simple and brutal: how do you know a self-driving car is safe enough for public roads? You cannot drive it around the block a few times and call it good. You run millions of simulated miles, regression suites that catch the one behavior that quietly changed, and production metrics that tell you the truth after deployment.

Hopkins saw the same structural problem arriving in voice AI. An AI voice agent runs several models simultaneously: one transcribes speech, another works out a response, a third speaks it back. That mirrors the perception, planning, and control systems in a self-driving car. In both cases, simulation is the practical way to test at scale.

The numbers Coval cites from its own customer data are striking. Roughly 95 percent of voice agents work in a demo. Only about 62 percent survive their first week live. They fail on accents, interruptions, background noise, and callers who go off script. These are not edge cases. They are the conditions of every real phone call.

Coval runs tens of millions of simulated evaluations on voice agents before they go live and continues monitoring in production. The company says customers cut manual QA work by up to 30 times and deploy agents up to 10 times faster. Over 60 organizations use the platform, including Zoom and Deepgram. Coval also pointed to broader market data: more than $7 billion went into voice AI in the first quarter of 2026 alone, with the market expected to pass $20 billion by 2031.

Scott Beechuk, the Norwest partner leading the round, framed the bet clearly: "Voice is going to be the number one interface for how humans interact with AI, and that shift creates an entirely new infrastructure layer for enterprises."

TELUS and ElevenLabs: what reliable voice AI looks like in production

While Scaled Cognition and Coval are building the tooling, TELUS Digital and ElevenLabs published data showing what happens when you deploy voice AI carefully. TELUS became ElevenLabs' preferred implementation partner for ElevenAgents on June 22, bringing 900-plus AI engineers and a forward-deployed model that embeds engineers directly in client operations.

The proof of concept is concrete. TELUS Communications ran a pilot where an ElevenLabs voice agent proactively called newly activated home internet customers during their first 90 days. The agent walked them through their first bill and onboarding questions. Human agents handled account changes, troubleshooting, and any request for human support. The AI identified itself at the start of every call and gave customers the option to decline.

The results: customers who received the welcome call were less than half as likely to cancel within their first 30 days compared to the average new internet customer. They rated the calls an average of 8.5 out of 10. The integration works across Genesys, Twilio, Amazon Connect, Zendesk, and Salesforce, which means the voice agent plugs into the platforms most contact centers already run.

This is the part of the story that should get operator attention. The TELUS pilot was not a science experiment. It was a telecom company using a voice agent to fix a specific business problem (early-life churn) and measuring the outcome. The churn reduction is real revenue protection, and the satisfaction score suggests customers did not find the experience alienating.

What the three-layer split means for buyers

The pattern here is that the voice AI market is fragmenting by layer. Scaled Cognition owns the model: architect so the agent cannot fabricate facts. Coval owns the testing: simulate millions of scenarios before a real customer hears the agent. TELUS and ElevenLabs own the deployment: implement carefully, measure outcomes, iterate.

For a business evaluating voice AI, the diagnostic question is which layer is your bottleneck. If your agent sounds great but hallucinates refund policies or invents store locations, you have a model problem and Scaled Cognition's retrieval-only architecture is worth understanding. If your agent works in testing but falls apart when real callers have accents or interrupt, you have a testing problem and Coval's simulation approach addresses it directly. If your agent works technically but you cannot quantify the business impact, you have a deployment problem and the TELUS pattern of targeting a specific metric like churn is the model to follow.

The market data backs up the urgency. $7 billion flowed into voice AI in the first quarter of 2026. Most of that money built agents that demo beautifully and then stumble in production. Coval's own data says 38 percent of voice agents do not survive their first week. Scaled Cognition says enterprises consistently underestimate their hallucination rate by a factor of five. The infrastructure to catch these problems before customers do is finally arriving, and $128 million of fresh capital says investors believe the demand is real.

Sources: Scaled Cognition, SiliconANGLE, The Next Web, Coval, SiliconANGLE, PR Newswire, TELUS Digital / ElevenLabs, Norwest

$128M Into Voice AI Trust: Scaled Cognition, Coval, and TELUS Prove Reliability Is the Next Bottleneck

Scaled Cognition: architecting for zero hallucination

Coval: testing voice agents like Waymo tested self-driving cars

TELUS and ElevenLabs: what reliable voice AI looks like in production

What the three-layer split means for buyers

Like this kind of writing?