Site icon Silverback Labs

GA4 Difficulties: GA4 Breaks Hearts and Marketing Attribution Reports

Executive Summary // TL;DR

  • GA4 is good, it is a step forward in web analytics from UA.
  • Google is bias and doesn’t always advise on what is best.
  • Auditing, implementing and optimising GA4 as a bespoke configuration for your site is as necessary for GA4 as it was for UA. Don’t assume it will work effectively out of the box.
  • Don’t embrace black box algorithms, then complain that GA4 is confusing. You’re just embracing the confusion and then suffering the consequences. AI has become a buzzword just like ‘server side’ did. Server side is an add-on to an already robust implementation and AI is a tool most effective in the hands of someone already proficient in what they’re asking of the AI.
  • In my opinion, black box anything is a red flag. If, as a marketing analyst, you cannot understand or explain the attribution model, then its not a suitable attribution model.
  • I have observed a 40%+ desrcepency when switching between Data Driven and Last Click attribution models.
  • Switch the default attribution model from Data Driven back to Last Click. Ironically, it’s the the most reliable attribution model – at least you can be sure of the bottom of the funnel drivers and avoid Googles greedy PPC attribution …and data comparisons between platforms begin to show parity again.
  • Set reporting identity to Observed if comparisons and attribution reports are still off. Minimise the impacts of Consent Mode v2 and avoid Modelled/Estimated data. Do this especially if you’re working with BigQuery data and wanting to minimise discrepancies between GA4 and BQ.
  • GA4 UI uses an algorithm called HyperLogLog++ to query and present data fast. Querying is expensive and GA4 is free. This is a cost saving measure. So the data served is an estimation. Google documentation states a 2% discrepancy between GA4 and BigQuery.
  • Request Unsampled reports – ALWAYS. When you continue to run into sampling and cardinality issues opt for BigQuery data.
  • Want more accurate data? Use BigQuery data where possible. It was previously only available to 360 customers now standard GA4 users have access to an enterprise level data warehouse. It is still out of reach (yet accessible) to entry and mid level users.
  • Struggling with BigQuery data? If your business is at a certain level don’t be afraid to invest in data engineering services. Storage is inexpensive, querying can quickly get very expensive. Out of the box GA4>BQ isn’t really usable for reporting, only for data verification. Be prepared to invest in 20+ hours of data engineering to set up ETL processes, fact and dim tables, (then daily querying gets expensive).

 

Introduction

GA4 breaks hearts and marketing attribution reports.

The platform has received emphatically negative criticism.

It’s not without problems and there hasn’t been a seamless migration path (the migration propaganda was a joke). You couldn’t just walk from one platform to the next and that upset a lot of people who just wanted to stay on the UA boat.

It’s clear that UA and GA4 are significantly different.

The data model and schema is different, the dataLayer syntax is different, the UI is very different, all hit types reduced only to events, scopes initially reduced to only hit and user level. Filtering is limited, misleading configuration elements introduced and combined with advanced consent mode enforcement – modelled data that can’t be segmented from observed data.

…but the fundamentals of data analytics and data collection are still the same.

  1. The database still consists of facts and dimensions
  2. The majority of dimensions and metrics are the same or have equivalents
  3. The platform is still free for standard users
  4. Explorations offers massive improvements for custom report building

GA4 isn’t going anywhere and jaded users looking for alternatives are quickly dissuaded when comparing pricing models – GA4: free vs mid tier analytics platform for £1000’s/per month. They quickly scurry back to GA4.

For enterprise level, many suitable alternatives already dominate the marketplace such as Adobe Analytics and Snowplow.

One thing to understand is Google Analytics was built to support Google Ads, Google’s main revenue stream. An important perspective to consider when assessing some of GA4’s new ‘features‘ and also why GA4 will always be important to sites with an ad spend.

 

The story so far

Simplicity has always been Googles trademark, but Google Analytics was lagging behind enterprise level competitors. The update was necessary, desirable and is a step forward.

Universal Analytics launched in 2005 and ran as the leading analytics platform on the internet (for free) for 17.66 years until the doomsday sunset in July 2023. Thats nearly 18 years with no significant overhaul in code design.

18 years in internet years is more like 180 years.

The ever evolving and fast paced landscape of web to web 2.0 to web 3.0 to next.js, single page apps, developments in ecommerce tech, booking engines and gaming apps meant a new platform was necessary rather than updating UA.

 

Differences and Similarities: UA data vs GA4

There is a lot of familiarity between UA and GA4 and if you were proficient with UA, it shouldn’t take much to understand GA4, it’s still a dimensional model based on fact and dim tables.

However, some metrics are calculated differently.

Aside from having to get used to a new platform, there was a lot of struggle with year on year reporting after switching to GA4. There are a few things to consider when doing cross platform YoY analytics.

  • Enabling the ‘Observed’ Reporting Identity to exclude estimated/modelled data from GA4, as there is no modelled data in UA and these un-consented sessions are not presented and avoid sampling and condensed (GA4s way of eliminating cardinality issues) reports.
  • HLL++ is used for Users and Sessions only, not any other metrics
  • Sessions are defined differently not resetting at midnight, or when new UTM parameters are used
    this also affects last-click attribution numbers if you’re comparing these
  • Engagement/bouncing is defined differently
  • Depending when/if they implemented their cookie banner, you could be only collecting some data and comparing it to all data (not just a GA4/UA thing, but worth considering)
  • If looking at marketing channels, GA4 has different default channel grouping definitions AND a different attribution model unless set to match UA e.g. Data Driven vs Last Click
    ‘Users’ in GA4 is actually Active Users, not Total Users, whereas it is Total Users in UA.

 

Where to start

If you or your team are struggling with the data, then start with implementation basics. If it was an out of the box set up, then start with confgurations. Did you use the dreadful automated migration tool? Auditing, Implementing and Optimising are still fundamental just as they were with UA. Misconfigurations in GA4 are more common than UA.

Aside from misconfigurations and implementation errors, I have observed a lack of consistent reporting methodologies across teams because standard / pre-built reports are inadequate. GA4 appears to usher users to Explorations and custom report building, but this leaves too many possiblitites for inconsistent report building.

I recommend standising reporting by building templates in explorations and dashboards in your data visualisation tool of choice.

There are sometimes issues that temporarily impact the generation of daily aggregated tables. When this happens, Google serve from intraday tables instead. Unfortunately, these tables have a limited TTL (Time To Live – a mechanism that limits the lifespan or lifetime of data, preventing packets from circulating indefinitely and improving network performance and caching efficienc) and so, after a while, the data disappears or is inconsistant between different report configurations.

Silver lining: BigQuery export does not rely on aggregated tables and is unimpacted. Granular data stores (e.g. much of the explore module, particularly when using segments) are unimpacted.

Is GA4 un-usable? Nope, not at all, but to really utilise the tool it requires skillsets outside of most marketers ability and the support of a full data team of analysts, BI and data engeering support is out of reach for many small to medium businesses.

 

Typical Issues

Some issue I have repeatedly faced while working with troublesome GA4 implementations are outlined below.

 

Real Time Reporting and Data Freshness

GA4 allows for a data processing delay of 24-48 hours, depending on the data limits per property. A normal property’s standard intraday processing time is 4 – 8 hours. During peak times, when a property is processing 250 billion or more events, processing time may take up to two days.

If real time reporting is a necessity then I recommend setting up New Relic and creating real time dashboards. Its really a full stack monitoring tool, but they have a free tier with at least 8 days data retention.

 

Occurance of (not set)

(not set) indicates that no data was collected for the parameter at the time the event triggered.

Causes vary depending on where (not set) appears.

Landing Page Reports

When you get (not set) in Landing Page reports, the session didn’t have any page_view events (and hence no landing page). Its usually an orphan session that starts with an event that triggers (add_to_cart, scroll) without a page_view in the second session. i.e. Second session occurs because the first session expired, then the user does something to trigger an event that starts a second session, and then they leave.

It could also happen with consent when the page_view doesn’t trigger because the user hasn’t given consent, then they do but the page_view isn’t sent after the consent is updated.

Attribution reports

There have been several bugs affecting spikes in (not set) in Attribution reports mostly caused by Advanced Consent Mode v2. The GA team are aware and work on these issues when the pop up. This errounous traffic can be identified if you get traffic in the Unassigned Channel with a source / medium of (not set) / (not set). Typical if there is traffic with no referrer and no source / medium it should be in the Direct Channel, not Unassigned.

Another cause is blocking data collection until the second pageview when utm parameters are no longer in the URL. To reduce this investigate and optimise the GDPR/consent implementation to ensure optimal data collection.

 

Issues with SPA’s

Tracking Single Page Apps with GA4 comes with its own set of issues. A few of the the most notorious ones are:

Page path logged in the browser doesnt always match URL path. The browser doesnt always update with soft transitions and so page variables such as page_location and page_title should be set in the dataLayer from and GA4’s default browser collected variables over ridden via GTM configurations.

Duplicate pageviews are a common issue because of pageviews being recorded on browser URL history changes as well as pageview events.

 

Google Tag

The Google Tag is a piece of code that sends data from a website to linked Google product destination (Google Ads, GA4, or CM360). The Google tag simplifies the process of tracking and measuring the performance.

It is a central hub for sending data, making it easier to track user interactions, conversions, and other important metrics across different Google products.

Settings can conflict with other in-platform settings. A few issues I have frequently come across are:

Duplicate settings – There are a duplicate set of the same enhanced measurement settings in the Google Tag and the the GA4 UI. Set both.

Duplicate pageviews (and SPA pageview) – Auto enabling the SPA pageview setting to record page views on browser URL changes is often the cause of undedected duplicate pageviews.

Duplicate events – There is a setting to de-duplicate events that is off by default.

 

Google Tag Inheritance Model

The inheritance model the Google Tag uses has been the culprit of many unexplained tracking errors and isn’t talked about anywhere. Avoid relying on the inheritance model and shared event settings. Don’t be lazy in implementation.

Shared event settings vaiables call on a snapshot of data recorded from when the Google Tag loads. To reduce (not set) and ensure optimal data collection record fresh data for each event call.

 

Report Building

This is really a user error. Reports can be configured in multiple ways and different team members use slightly different configurations and resulting in imcomparible data.

 

Data Latency

There is a 48 hour latency period that severally affects data processing and load into the platform UI. Don’t bother looking at reports that include the most recent 48 hours of data e.g. last 30 days.

 

Explorations

Limits of Explorations

Explorations is subject to the following limits:

  • You can create up to 200 individual explorations per user per property.
  • You can create up to 500 shared explorations per property.
  • You can apply up to 10 segments per exploration.
  • You can apply up to 10 filters per tab.

Sampling and data thresholds

You can use Explorations to quickly perform custom queries on large amounts of data. However, your explorations may be based on sampled data if more than 10 million events are part of a particular exploration query.

To protect user privacy, Explorations and Reports are subject to data thresholds. If your exploration includes demographic information or data provided by Google signals, the data may be filtered to remove data that might identify individual users.

When an exploration is subject to either sampling or data thresholds, the icon in the right corner of the exploration changes from green to yellow. A tooltip displays information about the data in the exploration.

Incompatible request

If your exploration contains an incompatible combination of dimensions, metrics or both you will see the incompatible request icon asking that you update the request.

Condensed datasets

You can use Explorations to quickly perform custom queries on large amounts of data. However, your explorations may be based on sampled data if more than 10 million events are part of a particular exploration query.

To protect user privacy, Explorations and Reports are subject to data thresholds. If your exploration includes demographic information or data provided by Google signals, the data may be filtered to remove data that might identify individual users.

When an exploration is subject to either sampling or data thresholds, the icon in the right corner of the exploration changes from green to yellow. A tooltip displays information about the data in the exploration.

Estimated/Modelled data

 

 

Reporting Identity

Blended vs Observed data

 

Consent Mode

Basic Consent Mode
Advanced Consent Mode

This article says after analytics storage consent is denied, UA will not store any subsequent hits.
100 =
1 (do nothing)
first 0: ads storage denied
second 0: analytics storage denied
So that means UA should align with GA4 when set to Observed/modelled data is excluded

 

Attribution Model

Data driven probably with a bias towards Google Ads attribution as ultimately blah blah

GA4’s DDA model is a black box so we have no idea how it attributes, and I already speculated that its another attempt from Google to sway bias towards Paid Search (where they generate revenue).

But bottom line is we cannot explain why DDA attributes 20% more conversions to Email than Last Click, we don’t know how it weighted the attribution source. Thats a red flag to me, so I would prefer to avoid it.

Ironically, last click is the most reliable attribution model (you can at least be sure of the bottom-funnel drivers). Trying to rely on any other type of MTA (data-driven, modeled, or otherwise) is a fools errand, imo. First click has its valid uses too.

BigQuery export schema doesnt included calculated metrics or attribution models. You can build your own attribution model but cannot replicate data driven. MTA vs MMM

DDA blackbox cant be explained. Doesnt seem very data driven if you dont understand the data. Any model that you can explain to a marketing manager is valid. If you can´t explain it (datadriven blackbox), avoid it.

 

Differences in GA4 vs BQ

Algorhhythm – HyperLogLog++ 2%

 

BigQuery

Why its essential
Storgae Costs
Querying Costs
ETL processes
Building a Data Model / DBT
Report Buidling
Fresh Daily
Streaming
SLA

We understand that the BigQuery Export SLA has been a highly awaited feature for you in GA4. To support this feature, our product team has built an entirely new export option called the BigQuery Fresh Daily Export (360 feature only). This new export type is notably faster than existing export options, allowing you to receive your data reliably by a similar time each day.

We understand that getting the BigQuery export quickly and at a reliable time has been a top ask to support your migration to GA4. As a result, we launched this export to Open Beta in Q4 ‘23.

Our next step is to introduce an official SLA to this export type, a net-new feature in GA4. The SLA will guarantee export completeness by the same time in the morning each day. The launch date for the SLA must come after a period of validation on the new export type to ensure the product can meet the standards set by the SLA. The launch date for the SLA is now targeted for Q2 ‘24.

Please note that this continues to be a top priority item for our product team. We understand the importance of the SLA and appreciate your patience as the team continues to work through this.

 

Tools to Supplement a GA4 Implementation

New Relic

sGTM

Funnel.io

Big Query

Looker Studio

Looker BI

PowerBI

Tableau

 

GA4 Alternatives

Enterprise Level:

Mid-tier:

 

References

[UA→GA4] Comparing metrics: Google Analytics 4 vs. Universal Analytics
https://support.google.com/analytics/answer/11986666?hl=en#sessions&zippy=%2Cin-this-article

What’s new in Google Analytics: Releases – June 10, 2024
https://support.google.com/analytics/answer/9164320?hl=en#061024

How Consent Mode Affects the Way Google Analytics Records Data
https://adswerve.com/blog/how-consent-mode-affects-the-way-google-analytics-records-data

Google Fixes GA4 Attribution Models To Better Associate Conversions To Paid Search
https://www.seroundtable.com/google-ga4-attribution-update-paid-37562.html

DBT-GA4 Setup (On Demand)
https://caretjuice.com/courses/dbt-ga4-setup/

Intro to DBT-GA4
https://caretjuice.com/courses/intro-to-dbt-ga4/

Advanced dbt-GA4 (On-Demand)
https://caretjuice.com/courses/advanced-dbt-ga4/

Multi-touch attribution vs. marketing mix modeling
https://funnel.io/blog/mta-vs-mmm

How data is stored and displayed
[GA4] Understand how Analytics stores and displays data
https://support.google.com/analytics/answer/13888627?hl=en&sjid=11679720983765570458-EU&visit_id=638550799908046877-2362061252&ref_topic=13987797&rd=1

How data is stored and displayed
[GA4] About the (other) row
https://support.google.com/analytics/answer/13331684?hl=en&ref_topic=13987797&sjid=11679720983765570458-EU

How data is stored and displayed
[GA4] About data sampling
https://support.google.com/analytics/answer/13331292?hl=en&ref_topic=13987797&sjid=11679720983765570458-EU

How data is stored and displayed
[GA4] Automatic expanded data sets for Google Analytics 360
https://support.google.com/analytics/answer/11295588?hl=en&ref_topic=13987797&sjid=11679720983765570458-EU

How data is stored and displayed
[GA4] Expanded data sets for Google Analytics 360
https://support.google.com/analytics/answer/12867885?hl=en&ref_topic=13987797&sjid=11679720983765570458-EU#zippy=%2Cin-this-article

How data is stored and displayed
[GA4] Unsampled explorations for Google Analytics 360
https://support.google.com/analytics/answer/10896953?hl=en&ref_topic=13987797&sjid=11679720983765570458-EU#zippy=%2Cin-this-article

How data is stored and displayed
[GA4] Adjust sampling for Google Analytics 360
https://support.google.com/analytics/answer/13987905?hl=en&ref_topic=13987797&sjid=11679720983765570458-EU

[GA4] Reporting identity
Learn how Google Analytics measures users across devices and platforms.
https://support.google.com/analytics/answer/10976610?hl=en&ref_topic=12153537,12153943,2986333,&sjid=11679720983765570458-EU&visit_id=638550799908046877-2362061252&rd=1

[GA4] Data freshness
https://support.google.com/analytics/answer/11198161?hl=en&ref_topic=12153537,12153943,2986333,&sjid=11679720983765570458-EU&visit_id=638550799908046877-2362061252&rd=1

Limits of Explorations
https://support.google.com/analytics/answer/7579450?hl=en&ref_topic=12153537,12153943,2986333,&sjid=11679720983765570458-EU&visit_id=638550799908046877-2362061252&rd=1#limits-of-analysis&zippy=%2Cin-this-article

[GA4] BigQuery Export schema
https://support.google.com/analytics/answer/7029846?hl=en&ref_topic=9359001&sjid=11679720983765570458-EU#event&user&device&geo&app_info&collected_traffic_source&traffic_source&stream&ecommerce&items

Exit mobile version