
Revenue operations teams are increasingly leaning on AI to answer the questions that used to require a data analyst, a BI tool, and a two-day turnaround. Which accounts are at risk? Which deals are most likely to close? Where is the pipeline stalling? The problem is that the LLMs answering those questions are doing something far messier than most teams realize — and the fact that the answers look plausible is exactly what makes it dangerous.
In a recent RevOps Co-op webinar, Eli Portnoy and Rafaella Fontes of BackEngine joined moderator Camela Thompson to break down how large language models (LLMs) actually retrieve and process data, why the outputs so often mislead revenue teams, and what the architectural fix looks like. They also walked through five practical Claude-based workflows that show what's possible once the data problem is solved.
The starting point for understanding why AI fails RevOps teams is deceptively simple: LLMs are not trained on your data. They are trained on public data, which makes them excellent at answering general questions — medical queries, sports questions, historical facts — but fundamentally unprepared to answer anything about your business without being given that context explicitly.
The intuitive response is to add context. Connect your CRM, your call recording tool, your support tickets. Most teams do this via a Model Context Protocol (MCP), an open-source protocol that gives LLMs authenticated access to external data sources. Connect your HubSpot or Salesforce instance via MCP, and theoretically your LLM can now see your pipeline, your accounts, your activity history.
But here is where the session's central argument takes hold: access is not the same as understanding. And the fact that the LLM will give you a confident, well-formatted, plausible-sounding answer does not mean it gave you the right one.
"LLMs are incredibly good at giving us answers that satisfy us. That is what they are trained to do. That is the entirety of how they are built." — Eli Portnoy
This is the distinction that matters most for RevOps teams adopting AI tooling. The feedback loop is broken by design. Team members ask questions, get answers that feel accurate, and report back that the tool is "awesome" — while the underlying analysis may be built on a fraction of the relevant data.
To explain what happens under the hood when an LLM queries a connected data source, Portnoy offered an analogy that makes the mechanics visceral.
Imagine being dropped into the Library of Congress and asked to find the best history books on 16th-century art — with no catalog, no index, no ratings system, no guide. You cannot read millions of books. You have no defined concept of "best." So you wander into a section that looks vaguely like history, glance at some spines, pull a small sample, read those, and return with an answer. The answer is based on real books. It just isn't based on all the relevant books — or even most of them.
That is precisely what LLMs do when they access a CRM or data source via MCP.
An LLM has a context window — a finite amount of information it can hold in memory at once. Even at 100,000 or 1 million tokens, the entire corpus of a typical CRM, combined with call transcripts, emails, and product usage data, exceeds what can be meaningfully processed. So the LLM makes educated guesses: it samples, it skims the spines of a small subset of records, and it constructs an answer from that subset.
"What it's telling you is true, it's just not the right answer to your question. It might tell you that this one account is at risk. It might tell you that this one product feature matters. It might tell you that this one rep did this one thing, and those are all true. But if you go back to the question — which are my accounts at risk? — they're not actually the right answer because it didn't look at ninety percent of the data." — Eli Portnoy
The downstream consequences are compounding. Two team members asking the same question against the same connected data source will receive meaningfully different answers, because both are working from different random samples. Reports diverge. Conclusions contradict. And the organization develops what Portnoy called a "shadow data" problem — a growing pool of AI-generated outputs that look authoritative but were built on incomplete foundations. This is a problem the RevOps community has been wrestling with since long before LLMs entered the picture, as this exploration of fixing CRM data quality issues makes clear.
One practical diagnostic Portnoy recommended: after running any query in Claude or ChatGPT against a connected data source, ask the LLM directly how it found the data. What did it actually look at? What sources did it read? The answer will frequently reveal the narrow sample it worked from — and close the gap between how confident the response felt and how much of the data it actually covered.
The technical limitation becomes a RevOps-specific problem when you consider how revenue teams use AI-generated outputs. This isn't a team writing blog posts or summarizing meeting notes. These are teams making pipeline calls, flagging at-risk accounts, and preparing board-level reporting on forecast accuracy.
Thompson, drawing on experience across multiple organizations, noted that the myth-busting problem is not new — it predates LLMs entirely. When everyone runs their own Salesforce reports with slightly different filters and date ranges, conflicting numbers proliferate. LLMs with open CRM access accelerate and scale that problem dramatically.
"My least favorite exercise in revenue operations was myth-busting across teams, and that started well before LLMs when everybody was running their own Salesforce reports. And now it's just..." — Camela Thompson
The challenge of funnel leaks and data inconsistency is fundamentally a definitions problem as much as a data problem. Thompson used the example of "lead" — a term that means different things depending on whether you're looking at the Salesforce lead object, marketing qualified leads, or raw inquiries. If an LLM hits a CRM without a pre-defined, locked-down concept of what a lead means in that organization, the answer it returns to "how many leads are in the system?" could vary wildly depending on which records it samples and how it interprets field values. Designing lead stages for B2B is hard enough when humans are aligned; it becomes substantially harder when an LLM is interpolating definitions from inconsistent data.
The good news is that the fix is architecturally straightforward, even if it requires meaningful investment to execute. The Library of Congress analogy resolves cleanly: what the librarian needs is an index. A catalog that joins all the books, assigns ratings, tags them by topic, and allows surgical retrieval of exactly the right subset.
For LLMs operating on revenue data, that index is a pre-processed, joined data layer that sits between the raw data sources (CRM, call recordings, email, product usage) and the LLM itself.
"The right way to build it is to pre-process every single data point, join across the dimensions that matter, and build a single surface that LLMs can access." — Eli Portnoy
Practically, this means the joining work — stitching together an account's history across calls, CRM notes, emails, and product usage — happens at the data layer, not in the LLM's context window. Pre-processing for health scores, deal risk, speaker attribution, and other analytical dimensions happens at the data layer. Permissioning and governance — who gets access to which records, which data is authorized to hit the LLM — happens at the data layer.
The result is that when a RevOps leader asks "which accounts are at risk?", the LLM isn't wandering through a library. It's querying an index that has already done the joining, already computed the health scores, and can return a precise, consistent, complete answer every time.
Portnoy also noted that this architecture directly solves the MCP security concerns that prevent many organizations — particularly larger enterprises and regulated industries like healthcare — from connecting sensitive data to LLMs at all. With a permissioned middle layer, you can enforce role-based access controls before data ever reaches the LLM, rather than relying on the LLM itself to respect access boundaries it wasn't designed to manage. This connects directly to the broader conversation around why most revenue stacks aren't ready for AI — the limiting factor is rarely the AI model itself.
The session addressed this directly, and the answer is more nuanced than a simple recommendation. Portnoy laid out four variables to evaluate:
Thompson added the dimension most often underweighted in these conversations: the maintenance burden. Building a data layer is building a product. It has to be owned, updated, and supported by someone with the skills to do it. For lean RevOps teams or fractional operators, that calculation often tilts decisively toward a vendor solution.
"This is a product you're building and maintaining if you take it on yourself. So it's super, super important to be really realistic about your own bandwidth and the skill sets you have or your team has." — Camela Thompson
BackEngine is purpose-built to serve as this middle layer, with pre-built connectors for common GTM tools, pre-processed graph representations of account and deal data, and a permissioning model designed for enterprise environments.
With the architectural foundation established, Portnoy walked through five workflows his team has built using Claude's Cowork environment — all of which become reliable only when the underlying data is properly structured.
1. Real-Time Account Dashboard
A live artifact dashboard showing every active account as a scrollable list, with customer tier, revenue, signal count, and a letter-grade health score. Pre-LLMs, building a comparable health score view required a dedicated customer success platform (Gainsight-level investment) and months of configuration. With a properly structured data layer, it's a prompt and a few iterations of UI refinement. The dashboard updates continuously and can be reshaped in real time as reporting needs change.
2. Automated Weekly Status Updates
A scheduled Friday prompt that reviews email, calendar, Notion, Slack, and Google Drive to generate a 50-word summary of what was shipped and what was completed. Useful personally as a reflection tool; equally useful as a lightweight async management update. Fontes noted that the value isn't just the output — it's the accountability loop of having a system that tracks your commitments over time without requiring you to maintain it manually.
3. One-on-Ones That Don't Suck
Most one-on-ones become status updates because neither party has done the prep work to make them strategic conversations. This workflow pulls goal progress from connected systems and returns a green/yellow/red/black status on each active workstream before the meeting starts. Managers come in knowing what's on track and what needs help — which means the conversation can focus entirely on the latter rather than spending half the time on status reporting. This kind of structured preparation connects directly to the challenge of moving from tactical operations to strategic impact.
4. Open Loops Cockpit
A dashboard of every commitment made to customers or prospects that hasn't been resolved yet. The prompt instructs the LLM to compare what was promised in correspondence against what has been delivered, then surface the gaps. For anyone customer-facing, this is a systematic replacement for the mental overhead of tracking open commitments — and a meaningful reduction in the relationship risk that comes from letting things fall through.
5. Next Best Move
Portnoy's personal most-valued workflow: a Monday morning brief that pulls the ten highest-priority prospects for the week, synthesizes their interaction history, conducts web research on recent developments at each company, and drafts both a prioritized action plan and the corresponding outreach emails. The prep work that previously required manually cycling through CRM, email, and browser research is compressed into an automated output delivered before the week begins.
One of the session's most counterintuitive insights came from Portnoy's analysis of over 400 job descriptions for AI enablement, AI transformation, and AI strategy roles. The working assumption going in was that these roles would be heavily technical — focused on building workflows, automations, and integrations. The reality was almost the opposite.
"What companies are hiring for are people who know how to do change management, who know how to influence people to start using tools and change their habits and create new ways of doing things." — Eli Portnoy
The reason agents and AI workflows fail to stick isn't primarily technical. It's behavioral. People are busy. They're accustomed to existing processes. They're skeptical of tools that have disappointed them before. Even when the technology is genuinely better, adoption requires sustained investment in training, enablement, and the organizational patience to let new habits form.
This is the same lesson that shows up in every RevOps change management conversation: the systems work is table stakes, but the human work is what determines whether it actually lands. Portnoy's read is that the organizations seeing real gains from AI are the ones treating it as an organizational capability problem, not a tooling problem — and investing accordingly. The RevOps archetype that bridges the tactical and strategic is exactly the profile these organizations are finding most valuable.
The session made one thing clear: the gap between AI that feels powerful and AI that is reliably useful comes down to the work done before the question is ever asked. For RevOps teams, that means treating the data layer as infrastructure — not an afterthought — and investing in the organizational change that lets people actually use it.
Learn more about how BackEngine approaches the pre-processed data layer for GTM teams, and explore how it fits into your existing revenue stack.
Check out our blog, join our community and subscribe to our YouTube Channel for more insights.