Data Fragmentation: When One Platform Shows as 50+ Sources in GA4
Dozens of utm_source entries for the same email platform? This data fragmentation makes reporting impossible. Here's how it happens.
Your GA4 Traffic Acquisition report should have 5-10 main traffic sources.
Instead, you have 237 different source entries.
Mailchimp shows up 73 times with different names:
mailchimp-januarymailchimp-februarymailchimp-spring-campaignmailchimp-newsletter-2024-03-15- ...69 more variations
Facebook shows up 54 times:
facebook-spring-salefacebook-product-launchfacebook-retargeting-q1- ...51 more variations
You can't tell which platform performs best. You can't compare trends. You can't make budget decisions.
This is data fragmentation—when one traffic source splits into dozens (or hundreds) of entries because campaign details leak into utm_source.
Here's how it destroys your analytics and how to fix it.
Table of contents
- What Is Data Fragmentation?
- The 5 Types of Data Fragmentation
- Type 1: Date-Based Fragmentation
- Type 2: Campaign-Name Fragmentation
- Type 3: Campaign-Type Fragmentation
- Type 4: Combination Fragmentation
- Type 5: Typo/Inconsistency Fragmentation
- Real Example: 341 Sources for 6 Actual Platforms
- Why Data Fragmentation Is Worse Than You Think
- Impact 1: You Can't See Trends
- Impact 2: Segments and Filters Are Impossible
- Impact 3: Automated Reports Break
- Impact 4: Cross-Channel Attribution Fails
- Impact 5: Integration Problems
- How to Diagnose Data Fragmentation
- Test 1: Count Unique Sources (2 minutes)
- Test 2: Pattern Recognition (3 minutes)
- Test 3: Platform Aggregation (5 minutes)
- The Fix: Consolidate utmsource to Platform Names
- Step 1: Audit Current utmsource Values
- Step 2: Define Standard utmsource for Each Platform
- Step 3: Move Campaign Details to utmcampaign
- Step 4: Update All Campaign Templates
- Prevention: 5 Rules to Avoid Fragmentation
- Rule 1: Platform Name Only in utmsource
- Rule 2: Campaign Details Go in utmcampaign
- Rule 3: Use Exact Same Spelling Every Time
- Rule 4: Automate utmsource (Don't Type Manually)
- Rule 5: Quarterly Fragmentation Audit
- FAQ
- Can I fix historical fragmented data in GA4?
- What if I need to track different email types (newsletter vs promotional)?
- How do I aggregate fragmented historical data for reports?
- Does this apply to utmmedium and utmcampaign too?
- What if different teams manage different platforms and use different naming?
🚨 Not sure what's breaking your tracking?
Run a free 60-second audit to check all 40+ ways UTM tracking can fail.
Scan Your Campaigns Free✓ No credit card ✓ See results instantly
What Is Data Fragmentation?
Data fragmentation = When a single logical entity (like "email" or "facebook") appears as many separate entries in reports.
Healthy GA4 Traffic Acquisition report:
Session source | Sessions | Conversions
------------------|----------|------------
mailchimp | 8,200 | 187
facebook | 6,400 | 134
google | 12,300 | 289
linkedin | 3,100 | 78
impact | 1,200 | 45
5 sources. Clean. Readable. Actionable.
Fragmented GA4 Traffic Acquisition report:
Session source | Sessions | Conversions
----------------------------------------|----------|------------
mailchimp_newsletter_jan_2024 | 234 | 5
mailchimp_newsletter_feb_2024 | 198 | 4
mailchimp_promo_spring_sale | 521 | 14
mailchimp_promo_black_friday | 789 | 28
mailchimp_welcome_day1 | 67 | 2
mailchimp_welcome_day7 | 45 | 1
mailchimp_cart_abandonment | 203 | 8
...66 more mailchimp entries
facebook_spring_sale_2024 | 412 | 11
facebook_product_launch_video | 298 | 7
facebook_retargeting_cart_q1 | 189 | 5
...51 more facebook entries
237 sources. Impossible to read. Can't identify top performers. Can't make decisions.
The 5 Types of Data Fragmentation
Type 1: Date-Based Fragmentation
Adding dates to utm_source creates a new entry every time period:
Weekly fragmentation:
utm_source=newsletter-2024-01-08
utm_source=newsletter-2024-01-15
utm_source=newsletter-2024-01-22
utm_source=newsletter-2024-01-29
4 sources for 4 weeks of the same newsletter.
Over 1 year: 52 separate source entries for one email newsletter.
Type 2: Campaign-Name Fragmentation
Adding campaign names to utm_source:
Email campaigns:
utm_source=mailchimp-spring-sale
utm_source=mailchimp-summer-sale
utm_source=mailchimp-fall-sale
utm_source=mailchimp-black-friday
utm_source=mailchimp-cyber-monday
utm_source=mailchimp-year-end-sale
6 campaigns = 6 separate sources (should be 1: "mailchimp")
Type 3: Campaign-Type Fragmentation
Adding email types or campaign categories:
utm_source=newsletter-weekly
utm_source=newsletter-monthly
utm_source=promo-seasonal
utm_source=promo-flash-sale
utm_source=transactional-receipt
utm_source=transactional-shipping
utm_source=automated-cart-abandonment
utm_source=automated-welcome-series
8 sources for what should be 1 platform (e.g., "mailchimp")
Type 4: Combination Fragmentation
Combining multiple variables in utm_source:
utm_source=mailchimp_newsletter_january_2024
utm_source=mailchimp_newsletter_february_2024
utm_source=mailchimp_promo_spring_march_2024
utm_source=mailchimp_promo_summer_june_2024
Each unique combination = new source entry.
Formula: [Platform]_[Type]_[Campaign]_[Date]
Over 1 year with 3 campaign types: 36+ separate sources for one email platform.
Type 5: Typo/Inconsistency Fragmentation
Slight variations in utm_source naming:
utm_source=mailchimp
utm_source=mail-chimp (hyphen)
utm_source=MailChimp (capital letters)
utm_source=mail_chimp (underscore)
utm_source=mailchimp-email (added context)
utm_source=mailchip (typo)
6 entries for the same platform due to inconsistent naming.
😰 Is this your only tracking issue?
This is just 1 of 40+ ways UTM tracking breaks. Most marketing teams have 8-12 critical issues they don't know about.
• 94% of sites have UTM errors
• Average: $8,400/month in wasted ad spend
• Fix time: 15 minutes with our report
✓ Connects directly to GA4 (read-only, secure)
✓ Scans 90 days of data in 2 minutes
✓ Prioritizes issues by revenue impact
✓ Shows exact sessions affected
Real Example: 341 Sources for 6 Actual Platforms
Client: E-commerce brand Actual marketing platforms: 6
- Mailchimp (email)
- Klaviyo (email)
- Facebook Ads
- Google Ads
- Impact (affiliates)
- 2-3 referral partners
Expected GA4 Traffic Acquisition:
Session source | Sessions
------------------|----------
mailchimp | 4,200
klaviyo | 2,100
facebook | 6,800
google | 9,200
impact | 1,400
partner-blog | 340
What GA4 actually showed: 341 different sources
Mailchimp alone had 89 entries:
mailchimp_newsletter_2024-01-08 | 203
mailchimp_newsletter_2024-01-15 | 189
mailchimp_newsletter_2024-01-22 | 234
...86 more mailchimp entries
Why:
- Weekly newsletters with dates in utm_source: 52 entries
- Promotional campaigns with campaign names: 24 entries
- Automated emails with flow names: 13 entries
Facebook had 76 entries:
facebook_spring_sale_2024 | 521
facebook_spring_sale_retargeting | 298
facebook_product_launch_feb | 412
facebook_product_launch_video | 234
...72 more facebook entries
Impact of fragmentation:
-
Couldn't answer basic questions:
- "Which platform drives most traffic?" → Had to manually sum 341 rows
- "Is email growing month-over-month?" → No consistent source to compare
- "What's our best traffic source?" → Impossible to rank
-
Budget decisions on hold:
- CMO wanted to shift $15,000 from lowest-performing channel to best
- Couldn't identify either because data was fragmented
-
Report exports broken:
- GA4 reports paginated across 35+ pages
- CSV exports had 341 rows to clean manually
- Looker Studio dashboards showed 100+ source entries (unreadable)
After consolidating to 6 clean sources:
- Analysis time: 45 minutes → 5 minutes
- Identified top channel: Email (56% of conversions)
- Budget shift: $12,000/month from Display to Email
- ROI improvement: 34% increase in ROAS over next quarter
Why Data Fragmentation Is Worse Than You Think
Impact 1: You Can't See Trends
Question: "Is email traffic growing?"
With clean utm_source (mailchimp):
Month | Sessions | Change
-------------|----------|--------
January | 3,200 | -
February | 3,800 | +18.8%
March | 4,100 | +7.9%
Trend clear: Email growing steadily.
With fragmented utm_source:
January
mailchimp_newsletter_jan | 234
mailchimp_promo_new_year | 521
mailchimp_welcome_series_jan | 89
...12 more january sources
February
mailchimp_newsletter_feb | 198
mailchimp_promo_valentines | 612
mailchimp_welcome_series_feb | 103
...14 more february sources
You can't compare January vs February because the source names don't match.
You'd have to:
- Export all data
- Manually tag each source as "mailchimp"
- Sum by month in Excel
- Calculate trends
5 minutes of work becomes 2 hours.
Impact 2: Segments and Filters Are Impossible
Goal: Create GA4 segment for "All Email Traffic"
With clean utm_source:
Session source = mailchimp OR sendgrid OR klaviyo
Done. 3 conditions.
With fragmented utm_source:
Session source = mailchimp_newsletter_jan OR
mailchimp_newsletter_feb OR
mailchimp_promo_spring OR
mailchimp_promo_summer OR
mailchimp_welcome_day1 OR
mailchimp_welcome_day7 OR
...83 more conditions
89 conditions just for Mailchimp.
And every new campaign requires updating the segment.
Impact 3: Automated Reports Break
Many teams set up automated GA4 reports sent weekly or monthly:
With clean sources:
- Top 5 Traffic Sources report → Always shows the same 5 platforms
- Email Performance report → Filter utm_source=mailchimp
With fragmented sources:
- Top 5 Traffic Sources report → Shows random campaign-specific sources
- Email Performance report → Only shows campaigns with "mailchimp" in utm_source (misses variations)
Automated reports become unreliable.
Impact 4: Cross-Channel Attribution Fails
GA4's attribution models (data-driven, last-click, etc.) track user journeys across channels:
Example journey:
- User clicks Facebook ad
- Returns via email
- Converts via Google search
With clean sources:
facebook → mailchimp → google (clear path)
With fragmented sources:
facebook_spring_sale_2024 → mailchimp_newsletter_mar_15 → google_brand_cpc
GA4 sees these as 3 unique sources in the path. But for analysis, you want to see the journey as: Social → Email → Search.
Fragmented sources make cross-channel path analysis nearly impossible.
Impact 5: Integration Problems
CRM integration: If you sync GA4 source data to your CRM (HubSpot, Salesforce), fragmented sources create:
- 341 different "Lead Source" fields in CRM
- Can't report on "Leads from Email" because email has 89 different source names
- Salesforce reports show 300+ picklist values (unusable)
Data warehouse/BI tools: If you export GA4 data to BigQuery, Snowflake, or Looker, fragmented sources require:
- Complex SQL CASE statements to group sources
- Regular expression matching (error-prone)
- Manual mapping tables that need constant updates
How to Diagnose Data Fragmentation
Test 1: Count Unique Sources (2 minutes)
- GA4 → Reports → Traffic Acquisition
- Primary dimension: Session source
- Scroll through pages, counting entries
Benchmark:
- Healthy: 5-20 unique sources (small business), 20-50 (enterprise)
- Moderate fragmentation: 50-100 sources
- Severe fragmentation: 100+ sources
Test 2: Pattern Recognition (3 minutes)
- GA4 → Explore → Free Form
- Dimension: Session source
- Metric: Sessions
- Sort by sessions (descending)
- Export CSV
Look for repeating patterns:
mailchimp_[ANYTHING] ← 73 rows
facebook_[ANYTHING] ← 54 rows
google_[ANYTHING] ← 42 rows
If you see the same platform prefix repeated dozens of times with different suffixes, you have fragmentation.
Test 3: Platform Aggregation (5 minutes)
- Export source data from GA4 Explore
- In Excel/Sheets:
- Column A: Session source (as-is from GA4)
- Column B: Extract platform name (remove dates, campaigns, etc.)
- Column C: Count rows per platform
Example:
| Session Source | Platform | Count |
|---|---|---|
| mailchimp_newsletter_jan | mailchimp | 1 |
| mailchimp_newsletter_feb | mailchimp | 2 |
| mailchimp_promo_spring | mailchimp | 3 |
| ...87 more mailchimp rows | mailchimp | 89 |
If "Count" exceeds 10 for any platform, you likely have fragmentation.
The Fix: Consolidate utm_source to Platform Names
Step 1: Audit Current utm_source Values
Export all utm_source values from GA4 and identify the platform behind each:
Current utm_source | Actual Platform
--------------------------------|------------------
mailchimp_newsletter_jan | mailchimp
mailchimp_promo_spring | mailchimp
sendgrid_transactional_receipt | sendgrid
facebook_spring_sale_2024 | facebook
linkedin_webinar_q1 | linkedin
Step 2: Define Standard utm_source for Each Platform
Choose ONE utm_source value per platform:
| Platform | Standard utm_source |
|---|---|
| Mailchimp email platform | mailchimp |
| SendGrid email platform | sendgrid |
| Klaviyo email platform | klaviyo |
| Facebook Ads | facebook |
| Google Ads | google |
| LinkedIn Ads | linkedin |
| Impact affiliate network | impact |
Step 3: Move Campaign Details to utm_campaign
Before (fragmented):
utm_source=mailchimp_newsletter_spring_2024
utm_medium=email
utm_campaign=march
After (consolidated):
utm_source=mailchimp
utm_medium=email
utm_campaign=newsletter-spring-2024-march
All campaign context moves to utm_campaign.
Step 4: Update All Campaign Templates
Update every tool that generates UTM parameters:
Email platforms:
- Mailchimp: Settings → Tracking → utm_source default =
mailchimp - SendGrid: Settings → Tracking → utm_source =
sendgrid - Klaviyo: Settings → UTM Tracking → utm_source =
klaviyo
Social schedulers:
- Hootsuite: Settings → Link Tracking → utm_source =
[platform-token] - Buffer: Settings → Default Parameters → utm_source = dynamic platform name
Ad platforms:
- Google Ads: Tracking template → utm_source=google
- Facebook Ads: URL parameters → utm_source=facebook
✅ Fixed this issue? Great! Now check the other 39...
You just fixed one tracking issue. But are your Google Ads doubling sessions? Is Facebook attribution broken? Are internal links overwriting campaigns?
• Connects to GA4 (read-only, OAuth secured)
• Scans 90 days of traffic in 2 minutes
• Prioritizes by revenue impact
• Free forever for monthly audits
Join 2,847 marketers fixing their tracking daily
Prevention: 5 Rules to Avoid Fragmentation
Rule 1: Platform Name Only in utm_source
utm_source = platform or vendor name, nothing else:
- ✅
mailchimp - ✅
facebook - ✅
impact - ❌
mailchimp-spring-campaign - ❌
facebook-ads-2024
Rule 2: Campaign Details Go in utm_campaign
All campaign-specific information belongs in utm_campaign:
- ✅
utm_campaign=newsletter-spring-sale-2024 - ✅
utm_campaign=facebook-product-launch-video - ❌
utm_source=newsletter-spring-sale(wrong parameter)
Rule 3: Use Exact Same Spelling Every Time
Consistency prevents typo-based fragmentation:
- ✅ Always
mailchimp(lowercase, no hyphen) - ❌
mailchimp,MailChimp,mail-chimp,mail_chimp(creates 4 sources)
Rule 4: Automate utm_source (Don't Type Manually)
Set utm_source as a default in tools, don't type it manually every campaign:
- Email platform default settings
- Social scheduler templates
- URL shortener defaults
- UTM parameter snippet library
Manual typing = typos = fragmentation.
Rule 5: Quarterly Fragmentation Audit
Every 3 months:
- Export GA4 source data
- Count unique sources
- Identify fragmented platforms
- Consolidate for future campaigns
FAQ
Can I fix historical fragmented data in GA4?
No. GA4 doesn't allow retroactive data changes. Historical data stays fragmented. Focus on fixing future campaigns. Over 6-12 months, clean data will accumulate and fragmented data will age out of default reporting windows.
What if I need to track different email types (newsletter vs promotional)?
Use utm_campaign or utm_content to differentiate:
Newsletter:
utm_source=mailchimp
utm_medium=email
utm_campaign=newsletter-weekly
Promotional:
utm_source=mailchimp
utm_medium=email
utm_campaign=promo-spring-sale
Both show as "mailchimp" source, but you can filter by campaign name for granular analysis.
How do I aggregate fragmented historical data for reports?
Create a GA4 custom dimension or use regex in BigQuery:
BigQuery example:
CASE
WHEN REGEXP_CONTAINS(source, r'mailchimp') THEN 'mailchimp'
WHEN REGEXP_CONTAINS(source, r'facebook') THEN 'facebook'
ELSE source
END AS consolidated_sourceThis groups all mailchimp-* sources as "mailchimp" for reporting.
Does this apply to utm_medium and utm_campaign too?
utm_medium: Should also be consistent (e.g., always email, not newsletter or e-mail)
utm_campaign: Can and should change per campaign—this is where details belong. Fragmentation here is expected and useful.
What if different teams manage different platforms and use different naming?
Create a company-wide UTM naming convention document with approved utm_source values:
Platform | utm_source Value | Owner
------------------|-----------------|--------
Mailchimp | mailchimp | Marketing
SendGrid | sendgrid | Product
Facebook Ads | facebook | Paid Social Team
Google Ads | google | SEM Team
Require all teams to use the approved list.