The AI Moat Is the Data: Harvey, CoStar, AdvancedMD Show Why

Three things happened in 48 hours this week that look like separate product announcements. They are not separate. Harvey, the biggest legal AI company, started training custom open-source models to encode law firm workflows. CoStar built conversational AI for Apartments.com from its proprietary listing dataset. And a clinical agent platform called Insight Health landed on AdvancedMD's marketplace, bringing workflow-specific AI to thousands of independent medical practices.

Here is what connects them: the competitive moat in vertical AI just moved from the model to the data. The first wave of industry-specific AI bolted general-purpose models onto domain data and hoped for the best. This week showed what the second wave looks like.

Harvey wants law firms to "own their intelligence"

Harvey co-founder Gabe Pereyra announced on X that the company is building its first legal foundation model, "inspired by Cursor's Composer." Two goals: serve frontier-level intelligence at lower cost with strong data security, and, in Pereyra's words, "create the foundations for law firms to build their own specialized models and own their own intelligence."

That phrase matters. A year ago, the dominant thesis in legal AI was that general-purpose models would keep improving until custom training became unnecessary. Harvey is now betting the opposite. The company is running proof-of-concept studies with law firms to train open-source LLMs on their specific workflows and client relationships. CEO Winston Weinberg told Artificial Lawyer that the value comes from encoding the entire experience, from firm to client, so automation can be applied with precision.

The models target "complex client matters that span months and take dozens of associates," according to Pereyra. The agentic system learns to control legal tech tools, delegate to sub-agents, and escalate to human partners when needed. Harvey has open-sourced benchmarks for evaluating the post-training work and reports "promising results" in approaching frontier model performance with domain-specific training.

Harvey is not alone in this reversal. Kirkland & Ellis is investing $500 million with Palantir to, in their words, "bottle their secret sauce." Thomson Reuters is training open-source LLMs on its legal data archive. The consensus has flipped: general models alone are no longer enough for complex regulated work.

Three forces drove the shift back. Data security concerns make on-premises or privately-trained models more attractive for regulated industries. Post-training genuinely improves performance on domain-specific tasks. And agentic workflows need encoded playbooks and reference data to function, which general models cannot provide out of the box.

CoStar turned decades of real estate data into an AI advantage

CoStar Group launched Apartments.com Ai this week, replacing traditional filters and keyword search with conversational AI. Renters can now describe what they want in natural language, ask complex questions, compare communities, and get guided recommendations.

CEO Andy Florance framed it as "combining artificial intelligence with the most comprehensive multifamily data in the industry." That is the entire point. Apartments.com Ai works because CoStar spent decades building a dataset that no startup can replicate. The AI layer is the product, but the moat is the data underneath it.

This is not a ChatGPT wrapper. It is a conversational interface built on proprietary listing data, pricing history, availability feeds, and community information that CoStar has accumulated since 1987. A competitor could license GPT-4 tomorrow and still not match the depth of CoStar's apartment intelligence. The model is commoditized. The data is not.

The launch follows CoStar's Homes Ai for homebuyers, so this is clearly a platform play, not a one-off feature. For property managers and real estate operators, the implication is direct: listing platforms are becoming AI intermediaries. The renter's first conversation about your property may happen with an AI, not a leasing agent.

Insight Health encoded clinical workflows into deployable agents

Insight Health joined the AdvancedMD Marketplace on June 17, making its clinical AI agents available to thousands of independent medical practices, multi-specialty groups, and billing companies. The agents handle referral and fax intake, document classification, prior authorization, and scheduling.

This is where the "data moat" thesis meets operational reality. Insight Health's agents save providers more than two hours per day on documentation and enable practices to see four to five additional patients per week. They work because they encode clinical workflow knowledge. The agents know what a clean referral looks like because they have processed thousands of them. They know which documents need escalation because they have seen the patterns.

The context is stark. Healthcare administrative overhead exceeds $1 trillion annually in the United States. The Medical Group Management Association found that 59% of practices receive 300 or more inbound calls every day. Independent practices, which lack the IT budgets of hospital systems, are drowning in paperwork while losing patients to delayed referrals and unanswered phones.

Insight Health's marketplace integration matters because it puts domain-specific AI within reach of practices that cannot build their own. The agents plug into existing AdvancedMD workflows rather than requiring a rip-and-replace. This is the distribution model that makes vertical AI practical for small and mid-size operators.

How the vertical AI moat shifted from Wave 1 (wrapping general models) to Wave 2 (training custom intelligence from domain data)

What the convergence tells you

If you are evaluating AI tools for your business, the most important question changed this week. It is no longer "what model does this use?" The model is increasingly irrelevant. GPT-4, Claude, Gemini, open-source alternatives: they are all converging on similar baseline capabilities.

The question that actually separates defensible products from thin wrappers is: what data went into building this, and who owns it?

A legal AI startup that fine-tunes on publicly available case law is replicable. Harvey training custom models on a firm's specific client relationships and workflow playbooks is not. A real estate chatbot that queries public MLS feeds is replaceable. CoStar building conversational AI from decades of proprietary multifamily data is not. A healthcare chatbot that answers FAQs is a weekend project. Insight Health encoding clinical referral patterns into deployable agents takes years of domain exposure.

The pattern across all three stories is the same. Start with a narrow, high-value workflow. Accumulate domain data until surrounding systems become the bottleneck. Then encode that data into custom models and agents that general-purpose tools cannot match. The companies that reach step three are the ones worth buying from.

The model providers will keep competing on price and performance. That race benefits everyone. But the vertical AI moat is not in that race anymore. It is in the data, the workflows, and the relationships that took decades to build and cannot be downloaded.

The AI Moat Is the Data: Harvey, CoStar, and AdvancedMD Show Why

Harvey wants law firms to "own their intelligence"

CoStar turned decades of real estate data into an AI advantage

Insight Health encoded clinical workflows into deployable agents

What the convergence tells you

Like this kind of writing?