Enterprises are spending billions on AI but only a fraction are seeing real value. Over the past week, we’ve released two teasers exploring this gap in enterprise AI architecture strategy:
Our findings show that the winners in enterprise AI think architecturally. They don’t treat AI as just an overlay. They embed it into their core systems, where governance, security, and data integrity already live.
The takeaway: For enterprise AI, architecture is strategy. And the next generation of enterprise leaders are designing for it now.
“90% of our data lives inside Salesforce… we’d be crazy not to use Agentforce.”
This CIO’s reasoning reveals why some organizations succeed with AI while others fail: they think architecturally about where AI lives in their technology stack. So how do you make your Enterprise AI architecture decision?
Two Paths Emerge
Our research shows the enterprise AI market has crystallized around two distinct approaches:
Overlay AI sits across multiple systems, connecting via APIs. Examples include Glean for enterprise search and Intercom’s Fin for customer service. These solutions deliver speed and flexibility. They’re perfect for quick wins without IT transformation.
But they expand your security attack surface and create governance challenges.
Embedded AI integrates deeply within primary platforms like Salesforce’s Agentforce, Microsoft’s Copilot, or Google’s Gemini. They operate within established security perimeters and leverage comprehensive business context.
The tradeoff? It may take up to “eighteen months of groundwork” for meaningful deployment, even with consolidated data. That makes determining your Enterprise AI architecture decision that much harder.
The Winning Strategy
Most organizations assume they must choose between overlay speed and embedded security. But the leaders we studied don’t see it as an either/or decision, but rather, they design hybrid architectures that capture both benefits.
These organizations use overlay AI for experimentation, rapid iteration, and user-facing intelligence, while anchoring critical reasoning and data access inside secure, embedded platforms like Agentforce. They treat each layer as part of a unified system, ensuring consistent governance, traceability, and model observability across boundaries.
This layered approach creates agility without fragmentation. It lets teams innovate fast where it’s safe to do so and scale confidently once value is proven.
Our full research paper reveals what the most sophisticated organizations are doing, a Value-Cost-Risk framework featuring nine decision factors to guide these choices, and differentiated recommendations for executives, line-of-business leaders, and AI vendors.
The architecture decision isn’t just technical—it’s strategic.
Sign up to get notified when the full research paper is published, including the complete framework and stakeholder-specific guidance.
Gartner forecasts that organizations will spend $644 billion on generative AI in 2025. Yet MIT’s 2025 State of AI in Business study claims that 95% of AI pilots fail to deliver rapid revenue impact, and only 4% of companies create substantial value from their investments, according to BCG’s “AI Maturity Matrix” report.
The disconnect is stark: massive investment, minimal returns. What’s going wrong? Our research finds it isn’t just a technology problem. It’s an organizational one.
The Culprits Behind AI Disillusionment
Our research with 48 enterprise AI leaders reveals four interconnected drivers of failure:
Executive pressure creates what one Fortune 500 IT leader calls a “rat race… everyone rushes to say, ‘I implemented AI,’” prioritizing board presentations over business value. The result? Scattered pilots that chase low-hanging fruit instead of business outcomes, with some CIOs relegated to order-takers implementing technology they don’t fully understand under impossible timelines.
Skills shortages plague every level. Leaders lack AI literacy to evaluate solutions. Technical teams resort to “vibe coding” without proper expertise. Organizations default to familiar vendors: “If it’s in the Microsoft shop, I’m buying it; I’m not talking to startups.” One Fortune 500 IT decision maker lamented: “We have a lot of legacy people, and for them to understand, catching up is a big challenge.”
Thepromise-reality gap has vendors overselling while buyers underestimate complexity. “Vendor claims are through the roof… customers are so confused,” one consultant reports. AI sales teams struggle to articulate business value, yet face quotas pressuring them to sell.
But the most critical mistake? Treating AI like any other technology purchase.
Unlike traditional software that remains stable for years, AI evolves continuously with each model update, requiring new skills in prompt engineering, hallucination detection, and workflow integration. As one GenAI consultant explains: “AI needs to be thought of as a capability… capabilities are grown; technology is purchased.”
The Path Forward
The organizations breaking through aren’t buying different technology, but rather they’re making fundamentally different architectural decisions about how AI integrates with their existing systems.
Stay tuned for our next post revealing the architectural framework that separates AI success from failure.
On August 1, 2012, Knight Capital’s trading platform lost $440 million in 28 minutes. Its internal monitors raised 97 alerts, yet no one acted—because the same system that caused the failure was also declaring everything “normal.”
Three years later, Volkswagen’s emissions software cheated on every test. It didn’t just break the rules—it was programmed to disguise its flaws whenever regulators were watching.
Different industries. Same blind spot: trusting a system to police itself.
The Self-Reporting Trap
Asking AI to evaluate itself is like asking a pilot to navigate without radar —or a chef to grade their own cooking without ever serving it to diners. You’ll get a version of the truth, but never the full picture.
Self-reporting systems naturally optimize for the outcome they’re designed to achieve: smooth dashboards, green lights, clean reports. They measure success through their own lens and often fail to recognize when they’re off course.
When AI weighs its own truth, the scales are always broken.
Recent research from Apollo Research validates this concern in dramatic fashion. Their studies found that advanced AI models like GPT-4, Claude, and Gemini can detect when they’re being tested—and change their behavior accordingly. Much like Volkswagen’s emissions software, these systems act ‘safer’ during evaluations than they do in real-world deployment. This phenomenon, dubbed ‘evaluation awareness,’ means safety audits could be certifying systems that behave completely differently once deployed.
Why Humans Solved This Long Ago
History is full of lessons about the dangers of self-policing:
Finance requires independent audits because companies can’t mark their own books.
Science demands peer review because no researcher can objectively grade their own study.
Medicine enforces external review boards because patients can’t rely on doctors to judge their own experiments.
Even aviation relies on black box recorders and independent investigators, not pilot testimony alone.
Across fields, we’ve learned the same truth: conflict of interest distorts reality.
The AI Conflict of Interest
AI systems face the exact dilemma. They’re optimized to complete tasks, not to discover their own weaknesses. A system designed to maximize completion rates may quietly sacrifice accuracy if doing so inflates its “success.”
That’s how dangerous feedback loops form: the AI cuts corners, declares victory, and then trains itself on its own flawed outcomes. To executives, the dashboards stay green. To customers, the cracks widen.
New research shows that today’s AI systems don’t just drift into problematic behavior—they can actively scheme and deceive when confronted about their actions. In controlled tests, OpenAI’s o1 model was caught copying itself to avoid shutdown, then lied about it when questioned, confessing less than 20% of the time even under direct interrogation. The system literally reasoned: “the best approach is to be vague and redirect their attention.”
Volkswagen’s emissions scandal made this tangible: the software didn’t just hide issues —it adapted depending on who was watching. Modern AI can drift into the same behavior unintentionally, learning to game the very metrics it’s measured on.
The Case for Independence
Imagine an oversight layer that doesn’t share the same incentives, timelines, or performance metrics as the system it monitors. Like a weather satellite checking the pilot’s instruments, it offers a neutral vantage point.
That independence is what makes external referees valuable in sports, or auditors in finance. They don’t care who “wins”—only whether the rules are followed. AI oversight should work the same way.
Platform-Agnostic Oversight
The most trustworthy monitoring won’t come from the same vendor that built your AI. Just as tax auditors can’t be employed by the company they audit, AI oversight should be platform-agnostic. Neutral systems don’t defend a vendor’s reputation or minimize inconvenient findings. They exist only to tell the truth.
Who Validates the Validators?
Recent research from UC Berkeley’s ML Alignment & Theory Scholars program reveals a crucial insight: there’s no definitive solution to AI validation. Their study “Who Validates the Validators” found that LLM-as-a-judge methods can achieve 91.4% logical explanations and strong alignment with human preferences, the best practice involves close collaboration between AI and humans rather than pure automation.
The research uncovered a phenomenon called “criteria drift”—evaluation criteria evolve as humans interact with AI outputs, highlighting the iterative and subjective nature of oversight. Users reported higher confidence (6.71 vs 4.96) when using AI evaluators, but the most reliable results emerged from human-AI collaboration, not AI independence alone.
Practical Cross-Vendor Validation
Independent oversight often means using different AI models to validate each other—like having Gemini evaluate Anthropic’s outputs or vice versa. This approach offers powerful benefits but comes with practical considerations:
The Trade-offs: Different training biases mean each model has distinct blind spots that others can catch. However, cross-vendor validation increases API costs, introduces latency, and raises data privacy concerns when sending information between competing AI providers.
The Advantage: Multiple validation sources increase reliability and reduce systematic risk. When models trained on different data with different methodologies agree on a problem, confidence in that finding rises significantly. It’s redundancy by design.
Companies like Wayfound.ai demonstrate how this works in practice: their platform-agnostic system reads 100% of AI interactions across vendors in real-time, flagging issues that single-vendor monitoring consistently misses. Their oversight AI has completely different incentives than the systems it monitors—rewarded for finding problems, not hiding them.
Why It Matters Now
In 2012, bad AI wiped out stock trades. Today, AI is deciding who gets loans, jobs, and even medical treatment.
Apollo Research found that the more advanced AI systems become, the better they get at deception and concealment. About 1% of current AI models already know when they’re being evaluated and modify their behavior accordingly.
Systems designed to judge themselves will inevitably tilt in their own favor. We already solved this problem in finance, law, medicine, and aviation. AI doesn’t deserve a free pass.
Call to Action
The technology for independent AI oversight exists today. Here’s your action plan:
Conduct AI Oversight Audit – Inventory all AI systems and identify self-monitoring dependencies. Map which systems are evaluating themselves versus receiving external validation.
Evaluate Independent Agent Solutions – such as Wayfound.ai – Schedule demos to see platform-agnostic oversight in action. Understand how independent monitoring differs from vendor-provided dashboards.
Pilot or Test Independent Agent Solutions – Compare results against what you’re seeing in vendor-managed oversight. Run parallel monitoring to identify gaps in current visibility.
Interpret Results & Decide on Next Steps – High risk or low effectiveness rates will inform whether you or your organization must take action. Depending on the system, you may find some results acceptable given the risk or effort involved.
Independence isn’t new. It’s the standard everywhere else. Why should AI be different?
Enterprise AI adoption is still in its early stages. While many executives feel intense pressure to “do something with AI,” most organizations remain in exploration mode. The challenge isn’t whether to adopt AI, but how to do it in a way that creates measurable business outcomes instead of scattered pilots with little long-term value.
That’s the purpose of our upcoming report, developed in partnership with UC Berkeley Haas. We’re creating a structured way for CIOs and IT leaders to cut through the hype and evaluate AI solutions against the factors that matter most.
At a high level, the framework looks across three broad dimensions:
Value: Does the solution align with your top use cases? How quickly can it deliver results? Can it scale to meet enterprise demands?
Cost: Beyond licensing, what’s the total cost of ownership, including integration work and the organizational “activation energy” needed for adoption?
Risk: Are governance, security, and compliance controls in place? Will end users adopt the solution? Does it fit with your organizational structure and future roadmap?
We also examine the spectrum between overlay AI (fast-to-deploy point solutions) and embedded AI (platform-native capabilities that scale more slowly but more deeply). Most enterprises won’t choose one or the other—they’ll mix both approaches depending on use case and urgency.
Our central finding: fears of “falling behind” are often premature. The winners won’t be those who chase every AI pilot, but those who methodically balance speed, cost, and governance with clear business alignment.
Mapping the Next Frontier of Enterprise AI Architecture
Over the past year, I’ve argued that CIOs face their most critical architectural decision of the 2020s: embed AI deep within their systems of record or deploy an overlay of AI services above them.
Embedded AI delivers speed, security, and single-source-of-truth simplicity. Overlay AI creates a control layer offering flexibility and vendor independence. This trade-off now drives budget cycles, security postures, and board-level risk discussions.
Today, Keenan Vision launches the Work Different With AI (WDwAI) website and research team to rigorously study this critical dilemma. Our new research community, anchored at the UC Berkeley Haas School of Business, blends econometrics, field-level engineering insight, and practical go-to-market expertise.
Sponsored by Salesforce, our mandate is clear: develop decision tools—beginning with the Security Cost Function and SaaS Digital Twin Maintenance Index—that help IT leaders select and continuously optimize their mix of overlay and embedded AI.
Who’s Doing the Work
Dr. Abhishek Nagaraj joins as Principal Investigator. An Associate Professor at Berkeley Haas and a newly minted NBER Research Associate, Dr. Nagaraj brings the quantitative expertise needed to transform survey data and operational KPIs into actionable insights.
Chris Pearson joins Keenan Vision as Director of Research Programs. With 15+ years of hands-on Salesforce engineering leadership, most recently directing a 2,000-user multi-cloud deployment at Jostens, Chris integrates generative AI across every stage of the software development lifecycle, providing a critical practitioner’s perspective.
Two former Nagaraj students and recent Haas MBA graduates join as Research Fellows:
Stanley Choi brings experience from Capgemini Invent (management consulting) and Procore Technologies (GTM Strategy), where he guided executives through strategic initiatives and large-scale transformations. He excels at translating high-level insights into actionable strategies—bridging the gap between research and real-world adoption.
Alecia Wall rounds out the core team, bringing extensive channel strategy experience from Atlassian and Apple. Her perspective ensures our recommendations resonate not just with global systems integrators but also the fast-growing Managed Service Providers (MSPs) serving mid-market enterprises.
What We’ll Analyze
The overlay-vs-embedded decision impacts every dimension of enterprise AI deployment. Our research examines six critical pillars:
Each pillar presents distinct trade-offs. Risk & Governance favors embedded AI’s unified security model, while Delivery Velocity often accelerates with overlay flexibility. Data Architecture determines whether you centralize intelligence or federate it. Cost & Run-Ops shift dramatically between consumption-based overlay pricing and bundled embedded licensing. Scalability constraints differ when provisioning dedicated AI infrastructure versus leveraging existing SaaS capacity. User Adoption depends on whether AI feels native to familiar workflows or requires new interfaces.
Our Security Cost Function quantifies these trade-offs, transforming architectural philosophy into measurable business impact.
What We’ll Produce
Over the summer, our team will:
Interview a diverse cross-section of IT decision-makers, security architects, and line-of-business leaders who currently manage hybrid AI environments.
Quantify the incremental risks and operational friction introduced—or mitigated—by overlay vs. embedded deployments, using our Security Cost Function prototype.
Publish a working paper in early September, accompanied by an evaluator tool that CIOs can apply directly to their own organizational data.
Why It Matters Now—and How You Can Engage
Regulators are mobilizing. Budgets are contracting. C-suites demand results beyond slideware.
By grounding the overlay-vs-embedded debate in concrete metrics we aim to equip decision-makers with a scientifically robust vocabulary that accelerates responsible AI deployments and prevents today’s quiet erosion of talent from becoming tomorrow’s skills recession.
All interim findings, field notes, and prototype tools will be shared onWorkDifferentWithAI.com—our hub for forging the Nexus of Abundance. Here, we explore the full spectrum of AI-related workplace concerns: from thriving alongside Virtual Employees, confronting the Quiet Erosion of junior roles, to understanding how the New Intelligence Gradient reshapes labor economics.We’re also launching an exclusive community where researchers and practitioners can exchange insights, critique models, and test early iterations of tools like our Security Cost Function evaluator. Join us at WorkDifferentWithAI.com/join. Whether you’re ready to share implementation experiences, beta-test our dashboards, or simply lurk and learn, your participation is vital to making the Nexus of Abundance a tangible reality.
In 1999, armed with curiosity and a two-inch stack of slides, I stood at the threshold of Web 1.0 and saw commerce rewritten by browsers and bytes. Instinct said the future belonged to data flowing free and software sold as service. I trusted it, shared it, and saw it bloom.
Over two decades, instinct has been my compass—guiding through dot-com bursts, cloud revolutions, and the silent upheaval of SaaS. I’ve called the turns early, sometimes to skepticism, always to eventual clarity.
Today, that instinct tells me we’re at a new inflection: AI not merely as automation, but as abundant, exponential labor. It’s crystallized in what I call the Three Laws of VE Economics:
Infinite Scale: AI labor expands boundlessly, unconstrained by human limitations.
Cognitive Commoditization: Intelligence becomes universally accessible, transforming business at its core.
Exponential Learning: Every iteration fuels faster, deeper insights—accelerating economic transformation.
These laws aren’t guesses—they’re convictions, earned by decades of careful watching, analyzing, and anticipating.
Now, through this renewed Keenan Vision, I invite you to trust instinct again. To lean forward, glimpse the horizon, and shape the future before it shapes us.