revopsAf the podcast

Episode 87: Garbage In, Garbage Out — What Campaign Data Actually Needs to Do

Follow us on your favorite podcast platform:

Most RevOps and marketing operations teams know that bad data produces bad outcomes. But the "garbage in, garbage out" principle gets abstract fast — especially when it comes to campaign data, where the inputs span a half-dozen systems, a dozen stakeholders, and an ever-growing list of AI tools hungry for clean signal.

‍

In this episode of RevOpsAF, Camela Thompson sits down with Drew Smith, founder and CEO of Attributa, a consulting firm specializing in marketing operations and marketing measurement. Drew brings years of experience untangling campaign data architecture across Salesforce, Marketo, and HubSpot environments — and he doesn't sugarcoat what it takes to make that data actually useful. The conversation covers the three foundational categories of campaign data every org needs to capture, the underutilized identifiers that can unify multi-channel campaigns, and why your data dictionary is now the single most important AI prerequisite you're probably ignoring.

‍

What Does "Useful" Campaign Data Even Mean?

‍

Before diving into what to capture, Drew reframes the entire conversation around a single question: is this data useful? And useful has a precise definition.

‍

"The way that we define useful at Attributa is that it has — that particular piece of data or data set — can be turned into meaningful action." — Drew Smith

‍

Usefulness is inherently audience-dependent. What a sales rep needs from campaign data (recency, context, a reason to follow up) is completely different from what a CMO needs, which is different again from what a demand gen manager needs for optimization. That means before you design any data capture, you have to ask who is consuming this data and what action do they need to take from it?

‍

Drew uses email opens as the canonical example of a once-useful data point that has crossed the line into useless. Between email clients that block tracking pixels and others that auto-open emails, the signal is so polluted that chasing it wastes time and corrupts downstream systems. If a data point can't drive a meaningful action, the right call is to move on.

‍

This framing connects directly to a challenge Camela highlights throughout the conversation: the language barrier between marketing and the rest of the go-to-market organization. When sales asks "which accounts engaged with this campaign," they mean accounts — not individual contacts, not email addresses, not anonymous web sessions. Campaign data that doesn't speak that language doesn't travel well beyond marketing's walls.

‍

The Three Non-Negotiable Categories of Campaign Data

‍

Drew identifies three foundational categories of campaign data that apply across every use case — attribution and measurement, lead scoring and actioning, audience orchestration, and immediate follow-up sequencing.

‍

1. Time Series Data

‍

Timestamps are the most important piece of campaign data, full stop. Without knowing when an engagement happened, you can't do attribution (you don't know where in the funnel the interaction occurred), you can't trigger timely follow-up, and you can't measure speed-to-lead or any other time-sensitive conversion metric.

‍

"Date timestamp time series data is quite possibly the most important part of campaign data that applies to every single use case you can possibly have." — Drew Smith

‍

This sounds obvious — but as Camela notes with understandable irony, some of the systems RevOps teams work with every day make it genuinely difficult to surface when a contact interacted with a specific campaign. That structural friction is exactly why this data point deserves explicit attention rather than assumed presence.

‍

2. Category-Level Data

‍

Categorization is the mechanism that lets you group campaigns together for measurement, apply differentiated scoring logic, and make intelligent downstream decisions. Salesforce campaign type and Marketo program channel are the most familiar implementations of this concept, but HubSpot handles it differently — and many orgs have data arriving from multiple sources that need to be reconciled.

‍

Drew's two principles for making categorization work: consistency and normalization.

‍

Consistency means the same campaign type label is used the same way, every time, everywhere. Normalization means creating a single field that collects and harmonizes input from disparate sources — UTM medium, UTM source, Marketo program channel, Salesforce campaign type — so reporting and actioning can reference one clean field instead of four conflicting ones.

‍

"If you have that in place, consistency and normalization, you really can't be beat when it comes to this particular piece of campaign data. If you're doing those two things well, you're not gonna have a lot of challenges with categorization." — Drew Smith

‍

3. Method of Engagement

‍

This is the "what did they actually do" layer — and it's where a lot of orgs leave value on the table. Drew draws on his seven years as an event manager to illustrate the stakes. At a trade show, there's a meaningful difference between someone who wanders by the booth, holds out their badge for a scan, and grabs a free t-shirt — what he calls the trick-or-treaters — and someone who stays for a 15-minute conversation. Both show up as booth visits. Only one should route to sales.

‍

The follow-up action for each is completely different. The trick-or-treater goes into a nurture sequence. The 15-minute conversation might auto-qualify as a sales-ready lead. Without capturing the method of engagement, you can't make that distinction.

‍

The same logic applies to webinars (live attendance vs. on-demand view), content (downloaded vs. scrolled past), and any other campaign format where the depth of engagement varies. Most MAP systems have a status field that captures this — but not all, and the field isn't always populated consistently.

‍

The Underutilized Identifier That Unifies Multi-Channel Campaigns

‍

Beyond the three core categories, Drew flags two additional data points that are consistently underused.

‍

The first: a unique campaign identifier that spans every tactic within a multi-channel campaign. When you launch a new product, you're not just sending one email. You're running LinkedIn ads, Google display, webinars, email sequences, in-person events, and organic social — all tied to the same initiative. Without a common identifier, each tactic lives as a separate data island, and you can never see the campaign as a whole.

‍

"You need some sort of unique identifier across all of those individual tactics within that campaign that allows you to unify that campaign as one holistic campaign. So you can see it as more than the sum of the parts." — Drew Smith

‍

The identifier itself doesn't matter much — it could be a root campaign name that gets embedded in every UTM, Salesforce campaign, and Marketo program, or it could be a project ID from your project management system. What matters is that you pick one approach and apply it consistently. Drew specifically steers teams away from Salesforce's parent-child campaign hierarchy for this purpose, and Camela seconds that advice with hard-won emphasis: Salesforce reporting simply cannot reliably traverse the object relationships that hierarchy creates beyond one level of nesting. A consistent text-based root value does the same job with far less infrastructure and maintenance burden.

‍

The second underutilized data point: account-level information tied to the person who engaged. The moment campaign data leaves marketing's walls, every stakeholder immediately asks "which accounts engaged?" — not "which contacts?" Building account context into campaign data from the start (whether that's company name, a target account list membership flag, or an account segment) saves time and prevents the inevitable translation work that happens when account-based teams try to consume contact-level marketing data.

‍

What AI Actually Needs from Your Campaign Data

‍

The conversation takes a sharp turn toward AI — and Drew's perspective here is more grounding than hype. At Attributa, the team is actively building an AI agent focused on attribution and measurement, so this isn't theoretical.

‍

The core principle: garbage in, garbage out doesn't just apply to dashboards and reports. It applies to AI with multiplied consequences. When a model encounters messy, inconsistent data, it doesn't error out gracefully — it reasons through to an output anyway, and that output may look plausible while being completely wrong.

‍

"That consistency principle is exponentially more important when it comes to AI because if, regardless of whether you're using AI for attribution measurement or AI for actioning or whatever, that AI has to have good, reliable data. Otherwise it's going to make mistakes, and that's not gonna be the fault of your AI agent. That's gonna be the fault of your data." — Drew Smith

‍

This connects to a broader point about how AI agents handle uncertainty. The better-designed ones are explicitly instructed: if you don't have the information required to answer this question, say so — don't fabricate something. Hallucinations are what happen when an AI is trained to produce output at any cost. The antidote is clear, complete, well-structured input data so the model never has to fill gaps with assumptions.

‍

Camela offers a vivid example of AI reasoning going sideways in a deduplication context: an AI analytics platform, when asked to identify duplicate contacts for merging, returned every contact with a blank email address as a single merge candidate. The logic was internally consistent — they all shared the same (empty) value — but the outcome would have been catastrophic if applied to production data. The fix required explicit guardrails telling the system to skip records with null email values. Without a data dictionary documenting field definitions and edge cases, that kind of instruction is impossible to provide systematically.

‍

The episode's most emphatic moment:

‍

"You have to have a data dictionary. If you don't have a data dictionary, you cannot use AI for stuff like this. You can, but it's gonna fail more frequently than it succeeds. You have to have a rock solid data dictionary for your campaign data if it's gonna be attached to anything AI related." — Drew Smith

‍

This is the prerequisite that most AI roadmaps skip. For more on what it takes to make your revenue stack AI-ready, the RevOps Co-op blog post on AI readiness for quote-to-cash systems is a useful companion read — the same data foundation principles apply across the revenue stack.

‍

The broader implication: if AI is anywhere on your roadmap, data optimization comes first. Not concurrently, not after the first agent fails — before. As Drew puts it: "Don't start anything with AI until you figured out your data nightmares."

‍

This theme echoes what we've heard in earlier conversations on the podcast — Episode 50: Thinking of AI? Think Data First makes the same argument at the RevOps infrastructure level, and Episode 83: Why You Should Stop "Doing AI" and Start Solving Problems takes it further into implementation strategy.

‍

Where the Systems Make It Hard

‍

No conversation about campaign data architecture would be complete without a frank accounting of how the systems themselves create obstacles.

‍

Salesforce earns a few specific critiques. The lead object remains a persistent friction point for account-level campaign reporting — leads don't have account relationships the way contacts do, which means account attribution requires extra logic or object-relationship workarounds. Custom objects are another landmine: they're architecturally appealing for storing complex data, but they hit report object limits quickly, they're difficult to use for actioning in third-party systems, and they frequently get used to recreate functionality that standard objects already provide. Drew and Camela trade examples of orgs that built custom objects for opportunity contact roles and even for opportunities themselves — decisions that created enormous technical debt for zero additional capability.

‍

HubSpot presents a different set of challenges. The campaign object has gone through multiple architectural iterations, each with different data structures and different implications for Salesforce sync. The net effect, in Camela's experience, is that surfacing the date a contact interacted with a specific campaign often requires reverse-engineering from form interactions and list loads rather than reading it directly from a campaign membership record — a fundamental gap for time-series attribution. The suspicion Drew voices — that HubSpot's data architecture is partly designed to keep data inside the HubSpot ecosystem — resonates with the pattern of friction that emerges when teams try to use HubSpot campaign data in external measurement tools.

‍

The practical guidance from both: resist the urge to over-engineer. A consistent picklist field on an existing standard object almost always outperforms a custom object that accomplishes the same thing. Simplicity scales; complexity doesn't. For anyone currently wrestling with their CRM data architecture, Episode 69: Should You Blow Up Your CRM? offers a useful framework for evaluating when a structural overhaul is genuinely necessary versus when better execution on existing objects is the real answer.

‍

The Next Generation of Marketing Automation

‍

Drew closes with an observation that any RevOps operator watching the market should file away: the marketing automation landscape is on the verge of a structural shift. Platforms that are AI-native from the ground up — rather than AI features bolted onto legacy data architectures — are emerging and beginning to gain traction. He mentions Minga as one example worth watching, along with the as-yet-unnamed platform being built by Marketo co-founder Jon Miller.

‍

The critical question for these new entrants isn't whether they'll include AI features — they will. It's whether they'll rethink campaign data structure in a way that learns from the problems endemic to first-generation MAPs. Time-series data that's actually surfaced cleanly. Category normalization that's built-in rather than bolted on. Account linkage that doesn't require workarounds. Method-of-engagement tracking that doesn't demand custom development.

‍

The same evolution is likely coming for CRMs. AI-native CRM platforms are beginning to emerge, and the orgs that have already done the work of building clean, well-documented campaign data infrastructure will be far better positioned to take advantage of whatever those platforms enable.

‍

Key Takeaways for RevOps and Marketing Operations Leaders

‍

Usefulness defines value. Every campaign data point should be evaluated against a specific use case and audience — data that can't drive meaningful action isn't worth capturing or maintaining.
Time series data is foundational. Timestamps apply to every use case: attribution, actioning, sequencing, and speed-to-lead measurement. If your systems make this hard to access, that's a structural problem worth solving.
Consistency and normalization beat clever architecture. A root campaign name applied consistently across all tactics — in UTMs, Salesforce campaigns, and MAP programs — outperforms parent-child hierarchies and complex custom objects every time.
Method of engagement is a lead routing input, not a nice-to-have. The difference between a trick-or-treater and a 15-minute booth conversation should determine follow-up action, and that means capturing it explicitly.
A data dictionary is a prerequisite for AI, not an afterthought. If an AI agent can't be told what a field means and how to use it, it will make assumptions — and those assumptions will be wrong in ways that are difficult to detect.
Don't start building AI until you've fixed your data. The risk profile of AI operating on messy campaign data is substantially higher than the risk profile of a messy dashboard. Fix the foundation first.

‍

For more on the data infrastructure questions that underpin this conversation, the RevOps Co-op post on why most revenue stacks aren't ready for AI is worth a read — as is Episode 45: Systems + Data = Optimal Operations for a broader look at how data architecture decisions ripple across the revenue org. And for anyone managing attribution specifically, the post on back-to-basics campaign configuration for optimal analytics covers the tactical execution layer that makes everything Drew described actually work in practice.

‍

Looking for more great content?

‍

Check out our blog, join our community and subscribe to our YouTube Channel for more insights.

‍

Sponsored by HG Insights — real technographics, real spend intelligence, and real market signals so your territory design, segmentation, and ICP aren't based on guesswork.

Back to Podcast →