Self-Reported Reattribution

Attribution measures clicks. But entire categories of marketing — podcasts, TV, word of mouth, out-of-home — work through influence, not clicks. They are invisible to every attribution model. Self-reported reattribution makes them visible.

12 min read|Updated March 2026
Source
Direct / None
How did you hear about us?
Reattributed to
YouTube / Paid Social

01.The Visibility Gap

Click-based attribution — as described in the cross-channel attribution methodology — tracks what it can see: ad clicks, organic search clicks, referral links. Every touchpoint in the journey needs a click with a URL parameter. If a channel does not produce a trackable click, it does not exist in your attribution data.

Entire categories of marketing work this way. Podcasts, TV, word of mouth, out-of-home, influencer content, radio, events — these channels create awareness and drive action, but the action they drive is almost never a direct click. The user hears about you on a podcast, then searches your brand name. They see a billboard, then types your URL directly.

Other channels fall in between. YouTube, TikTok, and AI chat (ChatGPT, Perplexity) generate some trackable clicks, but clicks capture only a fraction of their influence. A user watches a YouTube review, then searches your brand name a week later. They read a ChatGPT recommendation, then navigates directly to your site. The referral header exists, but the full influence is invisible to click-based attribution.

Attribution records the brand search click, the direct visit, the organic landing. The channel that actually introduced the user is invisible. The credit flows to the last trackable touchpoint — which is almost always brand search, organic brand, or direct.

ChannelWithout SRAWith SRAΔ
Direct / None847339-60%
Google / Organic532194-64%
Meta / Paid363435+20%
Google / Paid Search290290
Email12148-60%
YouTube73194+166%
TikTok48145+202%
AI Chat48121+152%
Other9748-51%
New channels
Word of Mouth266NEW
Podcast194NEW
TV / Radio97NEW
OOH73NEW
Influencer73NEW

Channel distribution before and after SRA correction. Direct / None drops from 35% to 14% as five new awareness channels appear.

The absorption problem

Brand search, organic brand, and direct traffic function as sponges. They absorb credit from every awareness channel that does not produce a click. A TV campaign that drives thousands of brand searches shows up as "Google Ads — Brand Search" in attribution. A podcast mention that sends listeners to your site shows up as "Direct / None." Word of mouth — your most valuable channel — is completely invisible.

The distortion compounds in two directions. Awareness channels appear to produce zero return, making them impossible to justify in budget discussions. Meanwhile, brand search and direct show artificially high performance, absorbing credit they did not earn. Budget flows toward the channels that are easiest to measure, not the ones that are most effective.

Why standalone SRA is not enough

The obvious fix — ask the customer "How did you hear about us?" — is the right idea but incomplete on its own. Standalone self-reported attribution has structural limitations:

  • Fragmented coverage — even at 85-95% response rates, some users do not answer. The gap is not random — users in a hurry, mobile users, and returning customers skip the question more often.
  • Response bias — not all channels have equal response rates. Users who discovered you through memorable channels (a friend, a specific podcast) recall and report more reliably than those who saw a display ad they have already forgotten.
  • Channel-level only — a user can tell you "I heard about you on a podcast," but not which campaign or ad group drove the visit. SRA provides channel attribution, not campaign-level granularity.
  • The triangulation fallacy — a common response is to average SRA with click-based attribution, hoping the truth lies in the middle. It does not. Averaging two different measurement methods with different biases does not cancel the biases — it produces a third number with no clear meaning.

Self-reported data is most valuable as a correction layer on top of click-based attribution — not as a replacement for it. The hybrid approach: attribution handles every channel that produces a trackable click. Self-reported data corrects the channels that attribution cannot see. On conflicts — attribution says Facebook, the user says search — the paid click wins. A verified click is stronger evidence than a recalled impression.

SRA also measures something no click-based system can: true brand awareness. "Already knew about you," "word of mouth," and "a friend recommended you" quantify the organic strength of your brand — PR impact, customer referrals, and unprompted recognition.

Attribution sees clicks. SRA sees influence. Neither is complete alone. Combined as a hybrid — with attribution as the foundation and SRA as the correction layer — the invisible channels become visible without compromising the channels that are already well-tracked.

02.Deploying the Survey

One question. Free-text input. Placed as early as possible in the conversion flow. That is the entire deployment.

Placement: earlier is better

The survey should appear at the first moment the user provides identifying information — email capture, account registration, or checkout. Post-purchase is the most common placement and the worst.

Response rates vary dramatically by placement:

  • During registration or checkout (optional field) — 85-95% response rate. The user is already filling out a form. One more optional field adds no friction.
  • Post-purchase — 50-60% response rate. The user has already completed their goal. A follow-up survey is an interruption, and response drops accordingly.

Earlier placement also captures a broader audience. A post-purchase survey only captures buyers. A registration-time survey captures every user who creates an account — including those who never purchase. For businesses with long consideration cycles (B2B, high-ticket e-commerce), post-purchase misses the majority of the addressable audience.

Registration / Checkout

User must complete this step

How did you hear about us?"friend told me about it"Sign UpRESPONSE RATE85–95%

Post-Purchase

User already got what they came for

Thank you for your order!How did you hear about us?(often skipped)RESPONSE RATE50–60%

Response rates by placement — registration and checkout dramatically outperform post-purchase.

Free-text over dropdowns

Dropdowns are simpler to analyze but worse for measurement. Five reasons:

  1. Channel discovery. A dropdown with pre-defined options cannot surface channels you did not think to include. Free-text reveals unexpected sources — specific podcast names, AI assistants (ChatGPT, Perplexity, Copilot), niche communities, and channels that did not exist when the dropdown was created. AI chat is showing meaningful volumes in production data.
  2. Retroactive reclassification. Free-text responses can be re-classified at any time. When a new channel category emerges, you re-run the classifier on historical responses and retroactively surface the data. Dropdown responses are locked to the options available when the user answered.
  3. No priming bias. A dropdown primes the user with options. If "Facebook" is listed first, users who are unsure will select it because it looks familiar. Free-text forces recall, which produces more accurate responses.
  4. Richer signal. "My colleague Sarah mentioned you at a conference" is far more informative than a dropdown selection of "Word of mouth." Free-text captures context that structured responses cannot.
  5. Position bias. Users disproportionately select the first option in a dropdown. Randomizing order helps but does not eliminate the effect. Free-text has no position bias.

Free-text captures signal that dropdowns structurally cannot.

The cost of free-text is classification complexity — raw text needs to be normalized into channel groups before it can be used in attribution. This is the problem the next section solves.

03.Classification and Mapping

Collecting free-text responses is easy. Making them usable for attribution is the hard part. A single survey question generates 1,000+ unique text strings: "my friend told me," "heard on Joe Rogan," "saw an ad on Instagram," "google," "already knew about you." Each one needs to be mapped to a channel group that attribution can use.

LLM classification

An LLM classifier normalizes raw text into channel groups. The input is the user's free-text response. The output is a standardized channel label: "Word of Mouth," "Podcast," "TV," "AI Chat," "Out-of-Home."

The classifier uses human-in-the-loop learning. Each human correction — "this response was classified as Social Media but should be Influencer" — enhances the classification prompt. Over time, the classifier converges on the project's specific channel vocabulary. Corrections are validatable via MCP: you can query the classifier to see every correction, every reclassification, and the current prompt.

LLM Classifier
"my friend told me"
Friend / WOMactionable
"saw on tiktok"
TikTokactionable
"heard on podcast xyz"
Podcastactionable
"google"
ambiguous
"idk"
auto-ignore
"some ad somewhere"
ambiguous
actionable
auto-ignore
ambiguous

LLM classifier normalizes free-text into standardized channel groups.

Triage: what to use, what to ignore

Not every survey response carries attribution signal. The classifier sorts responses into three categories:

  • Actionable — maps to exactly one unambiguous channel. "My friend recommended you" maps to Word of Mouth. "Heard on a podcast" maps to Podcast. These responses override attribution where applicable.
  • Auto-ignore — carries no useful signal. Null responses, garbage input ("n/a," "asdf"), "Other," "Organic," and opt-outs are excluded from the override logic entirely.
  • Ambiguous — could mean multiple channels. "Social media" could be paid Meta, organic Instagram, or influencer content. These require data-driven resolution — checking which platforms have ad spend, which dominate — before they can be classified.

The search engine problem

"Google" and "search engine" deserve special treatment. When a user says "I found you through Google," they could mean brand search (paid), generic search (paid), or organic search. These are three fundamentally different channels with different strategic implications. No amount of spend data resolves this ambiguity — the question is what the user searched for, not which engine they used.

The correct handling: ignore "search engine" responses for attribution override. Keep the original click-based attribution, which at least knows whether the click was paid or organic, brand or generic. Self-reported data adds nothing when click-based tracking already has the answer.

Conflict resolution

When attribution and survey data disagree, the rule is simple: paid clicks always win. If attribution recorded a paid Facebook click and the user says "search engine," the Facebook click stands. A verified click is stronger evidence than a recalled impression.

Override logic applies to two categories of traffic only:

  • Brand search and organic brand — these channels absorb awareness credit. A user who heard about you on a podcast searches your brand name, clicks the paid ad or the organic result, and attribution credits the search. The SRA response reveals the true discovery channel.
  • Non-paid traffic — direct, organic, and referral sessions where no paid click exists. There is no paid evidence to protect, so the self-reported channel is the best available signal.

Everything else — paid non-brand traffic where a verified click exists — keeps its original attribution. The override is a correction, not a replacement.

This is the classification and mapping methodology the SegmentStream Measurement Engine implements. Responses are collected, classified by an LLM with human-in-the-loop learning, triaged for signal quality, and mapped to channel groups with conflict resolution rules that protect verified paid clicks.

04.The Synthetic Touchpoint

Classification determines what channel the user came from. The synthetic touchpoint determines how that information enters the attribution pipeline. The goal: make self-reported channels appear in attribution reports alongside click-tracked channels, with no special handling required downstream.

How it works

When a user's survey response produces an actionable channel and the override rules approve it, the pipeline creates a synthetic session record. This record looks exactly like a regular session in the attribution table — same schema, same fields — but with a fabricated timestamp: one second before the user's first recorded visit.

First-click attribution picks up the synthetic touchpoint as the earliest session in the user's journey. The self-reported channel becomes the attributed source. No changes to the attribution logic, no special cases, no parallel reporting systems. The correction integrates directly into the existing pipeline.

Identity graph bridging

The survey is typically completed on one device — the one where the user registered or checked out. The first visit may have happened on a different device entirely. The identity graph bridges the gap: the user's universal_id links the survey device to the first-visit device. Without identity resolution, the synthetic touchpoint would have no journey to attach to.

Override rules

The synthetic touchpoint is only created when override conditions are met. Three types of traffic are treated differently:

  • Brand search and organic brand — overridden. These channels absorb awareness-channel credit by design. The self-reported response reveals the true discovery source.
  • Direct / none — overridden. No existing attribution signal to protect. The self-reported channel is the only available evidence.
  • Paid non-brand — protected. A verified paid click is stronger evidence than a survey response. The original attribution stands.

Before SRA

Mar 1First Visitdirect / noneMar 5Returnbrand searchMar 12Returnorganic brandMar 18Conversiondirect← first-touch attribution

After SRA — synthetic touchpoint injected

Mar 1 -1sSRA SignalTikTokMar 1First Visitdirect / noneMar 5Returnbrand searchMar 12Returnorganic brandMar 18Conversiondirect← first-touch attribution

Synthetic session inserted one second before first visit — first-touch attribution picks up the self-reported channel.

The pipeline

Five stages, each feeding the next:

  1. Survey collection — free-text response captured at registration, checkout, or email capture
  2. LLM classifier — raw text normalized to channel groups with human-in-the-loop learning
  3. Identity graph — survey user linked to their full cross-device journey via universal_id
  4. Synthetic touchpoint — session record created one second before first visit, with the self-reported channel as traffic source
  5. Attribution reports — standard first-click reports now include self-reported channels alongside click-tracked channels
SRA PIPELINESurveyFree-text input1LLM ClassifierChannel mapping2Identity GraphUser matching3SyntheticTouchpoint4AttributionReports5

Five-stage pipeline: survey response flows through classification, identity resolution, and synthetic touchpoint creation into standard attribution reports.

The synthetic touchpoint is what makes SRA a correction layer instead of a parallel reporting system. Self-reported channels appear in the same reports, the same tables, and the same budget optimization logic as every click-tracked channel. One pipeline. One source of truth.

05.What Goes Wrong and How to Avoid It

  1. Asking post-purchase instead of during registration. Post-purchase surveys reach 50-60% of users. Registration-time placement reaches 85-95%. The gap is not just volume — post-purchase misses every user who creates an account but does not buy, which in B2B and high-ticket e-commerce is most of the audience.
  2. Using dropdowns instead of free-text. Dropdowns are easier to analyze but cannot discover new channels, cannot be reclassified retroactively, and introduce priming and position bias. The classification problem is solvable. The missing-data problem is not.
  3. Overriding all traffic. Only brand search and non-paid traffic should be overridden. Paid non-brand clicks are verified evidence — overriding them with survey responses destroys accurate data to replace it with less accurate data.
  4. Trusting "search engine" or "Google" at face value. These responses are fundamentally ambiguous. They could mean brand search, generic paid, or organic. Click-based attribution already distinguishes between these — the survey response adds no information. Ignore them.
  5. Using raw text without classification. Raw survey responses contain hundreds of variations for the same channel. Without LLM classification, you end up with a fragmented long-tail that is impossible to use for attribution.
  6. Treating SRA as standalone measurement. Self-reported data has coverage gaps, response bias, and no campaign-level granularity. It is a correction layer for channels that attribution cannot see — not a replacement for click-based attribution.
  7. Skipping identity resolution. The survey is completed on one device. The first visit may have happened on another. Without the identity graph bridging survey responses to first-visit sessions, the synthetic touchpoint has no journey to correct.
  8. Expecting campaign-level granularity. A user can tell you "I heard about you on a podcast." They cannot tell you which ad group or bid strategy drove their awareness. SRA provides channel-level correction. Campaign-level optimization still depends on click-based attribution.

06.See It Work

Everything described above runs inside the SegmentStream MCP server. Four MCP methods expose the full SRA pipeline. The first returns the project's SRA configuration — extraction SQL, classifier settings, and override rules. The second shows raw survey answers with their LLM classification. The third aggregates by channel to show distribution after SRA correction. The fourth traces individual user journeys to show the before-and-after: original attribution versus corrected attribution.

+Reply...Opus 4.6
SRA Configuration
Pipeline Configuration
Enabledtrue
ClassifierON — LLM free-text classification
Classifier IDfe134267-cf72-435e-b2a5-cb232263c2ce
ExtractionGA4 events → self_reported_source param
Override rulesBrand search + non-paid → SRA channel
Corrections14 applied
Pipeline
extraction \u2192 classification \u2192 override
+Reply...Opus 4.6

Four MCP methods — project settings, raw answer classification, aggregated channel distribution, and individual user overrides showing before-and-after attribution.

What the methods show

get_sra_settings returns the project's pipeline configuration — whether classification is enabled, which classifier is active, and how override rules are structured.get_sra_answers shows every raw survey response and its LLM classification, making it easy to spot misclassifications and verify the triage logic.

get_sra_channels aggregates by classified channel — which channels appeared and what share of conversions they account for. Channels that were previously invisible — Word of Mouth, Podcast, TV, Out-of-Home — appear alongside click-tracked channels with real conversion counts. Underreported channels like YouTube, TikTok, and AI Chat show their true scale. The "Direct / None" bucket shrinks as absorbed credit is redistributed to the channels that actually drove it.

get_sra_overrides exposes the correction at the individual level: the original attributed channel, the self-reported channel, and the override decision. You can see exactly which users were reattributed, from which channel to which, and whether the override was triggered by the brand-search rule or the non-paid rule.

Validating the output

Every override is auditable. You can trace a reattribution to a specific universal_id, see the original survey response, the LLM classification, the override rule that fired, and the synthetic touchpoint that was created. The data lives in your BigQuery warehouse. The override logic is deterministic and configurable per project.

What changes in practice

Teams that deploy SRA correction typically see three shifts in their attribution data:

  • Invisible channels appear — Podcast, TV, Word of Mouth, Out-of-Home, and other channels with zero click tracking show up in reports for the first time with real conversion data. These were always driving value — attribution simply could not see them.
  • Underreported channels show their true scale — YouTube, TikTok, AI Chat, and prospecting social have some click tracking, but clicks capture only a fraction of their influence. SRA reveals the full contribution, typically 2-5x higher than click-only attribution.
  • Brand search and direct shrink — the credit sponges lose their artificially concentrated attribution as credit flows back to the channels that actually earned it

SRA also measures true brand awareness. "Already knew about you" and "word of mouth" responses quantify the channels that no click-based system can see: PR impact, organic brand strength, and customer referrals.

This whitepaper is best experienced on desktop. It includes interactive demos and data tables that show how the technology works. Send yourself a link to read later.