Home » Create your own event-based web analytics tag

Create your own event-based web analytics tag

It’s the end of SaaS web analytics tools locking in your data. New tools, such as Snowplow or Heap promise you data ownership. But in essence streaming web analytics to your own data warehouse is fairly easy. In this blog post we create our own (client-side) clickstream web analytics solution in Google Tag Manager.

If you’re an advanced GA/GTM user, you’re probably aware that many of the variables can be extracted from the GA payload, using the customTask API. However, the goal below is to be completely independent of Google Analytics — and that’s why you could replace the last variables below with your own cookie.

UTM Parameters

Let’s start with an easy one: extracting the UTM (Urchin Tracking Module) parameters from the URL. The UTM parameters are query string parameters and there are five of them: utm_campaign, utm_medium, utm_source, utm_content, utm_term. Capturing query string parameters can be done via out-of-the-box GTM features.

  • Create a new variable
  • Select URL as Variable Type
  • Select Query as Component Type
  • Enter utm_campaign (analogous for other UTM parameters) in the Query Key field
  • Optional: Convert the undefined values to “(none)”
  • Optional: You have some choices to make. In GA, when source and medium are not provided, they are replaced with (e.g.) the referrer. Feel free to convert undefined values to the GTM variable HTTP Referrer.
utm_campaign GTM query variable

Page dimensions

Some important page dimensions (or variables) are provided by default in GTM:

  • Hostname
  • Page Path (caveat — see below)
  • Referrer

However, I would like to expand the existing variables with the following:

  • Protocol (HTTPS or HTTP)
  • Page Title

Protocol

Capturing the protocol can be done via a Custom Javascript variable. You can simply access the protocol by returning the protocol property from location in the window object.

Protocol variable using a Custom JavaScript variable
function() {
  return window.location.protocol;
}

Page Title

Capturing the page title is completely analogous, however, you need to get it from another object: the document object.

function() {
  return document.title;
}

Optional: Remove utm parameters from the page path

Because we’re already storing the utm_parameters in their respective variables, it would be stupid to pass them tin our page path too. That’s why you can choose to remove them from the URL.

For this, we take a piece of JavaScript code that I was too lazy to write myself, and use it in a Custom Javascript variable. I loop over the 5 UTM parameters to remove them from the page path. You could use a for … of loop but this is not compatible with Internet Explorer.

function() {
  function removeURLParameter(url, parameter) {
    //prefer to use l.search if you have a location/link object
    var urlparts = url.split('?');   
    if (urlparts.length >= 2) {

      var prefix = encodeURIComponent(parameter) + '=';
      var pars = urlparts[1].split(/[&;]/g);

      //reverse iteration as may be destructive
      for (var i = pars.length; i-- > 0;) {    
        //idiom for string.startsWith
        if (pars[i].lastIndexOf(prefix, 0) !== -1) {  
          pars.splice(i, 1);
        }
      }

      return urlparts[0] + (pars.length > 0 ? '?' + pars.join('&') : '');
    }
    return url;
  }
  pagePath = '{{Page Path}}';
  parameters = ['utm_campaign', 'utm_source', 'utm_medium', 'utm_term', 'utm_content'];
  for (p in parameters) {
    pagePath = removeURLParameter(pagePath, parameters[p]);
  }
  return pagePath;
}

Device dimensions

The next category of variables that we would like to capture are device dimensions. Many of these are available in Google Analytics, and especially for product testing they can be very useful. I have decided to include the following dimensions:

  • Browser
  • Browser Version
  • Browser Language
  • Operating System
  • Device category
  • User Agent

Browser (Version)

Another piece of JavaScript code I was too lazy to write myself. I modified Shweta’s code for readability purposes and split it for browser and browser version. For the browser and the browser version variables, you should once again make two Custom JavaScript variables.

Browser

function() {
  var navUserAgent = navigator.userAgent;
  var browserName  = navigator.appName;
  var browserVersion  = ''+parseFloat(navigator.appVersion); 
  var majorVersion = parseInt(navigator.appVersion,10);
  var tempNameOffset,tempVersionOffset,tempVersion;
  
  if ((tempVersionOffset = navUserAgent.indexOf("Opera")) != -1) {
    browserName = "Opera";
  } else if ((tempVersionOffset = navUserAgent.indexOf("MSIE")) != -1) {
    browserName = "Microsoft Internet Explorer";
  } else if ((tempVersionOffset = navUserAgent.indexOf("Chrome")) != -1) {
    browserName = "Chrome";
  } else if ((tempVersionOffset = navUserAgent.indexOf("Safari")) != -1) {
    browserName = "Safari";
  } else if ((tempVersionOffset = navUserAgent.indexOf("Firefox")) != -1) {
    browserName = "Firefox";
  } else if ((tempNameOffset = navUserAgent.lastIndexOf(' ') + 1) < (tempVersionOffset = navUserAgent.lastIndexOf('/'))) {
    browserName = navUserAgent.substring(tempNameOffset,tempVersionOffset);
    if (browserName.toLowerCase() == browserName.toUpperCase()) {
      browserName = navigator.appName;
    }
  }
  return browserName;
}  

Browser Version

function() {  
  var navUserAgent = navigator.userAgent;
  var browserName  = navigator.appName;
  var browserVersion  = ''+parseFloat(navigator.appVersion); 
  var majorVersion = parseInt(navigator.appVersion,10);
  var tempNameOffset,tempVersionOffset,tempVersion;
  
  if ((tempVersionOffset = navUserAgent.indexOf("Opera")) != -1) {
    browserVersion = navUserAgent.substring(tempVersionOffset + 6);
    if ((tempVersionOffset = navUserAgent.indexOf("Version")) != -1) { 
      browserVersion = navUserAgent.substring(tempVersionOffset + 8);
    }
  } else if ((tempVersionOffset = navUserAgent.indexOf("MSIE")) != -1) {
    browserVersion = navUserAgent.substring(tempVersionOffset + 5);
  } else if ((tempVersionOffset = navUserAgent.indexOf("Chrome")) != -1) {
    browserVersion = navUserAgent.substring(tempVersionOffset + 7);
  } else if ((tempVersionOffset = navUserAgent.indexOf("Safari")) != -1) {
    browserVersion = navUserAgent.substring(tempVersionOffset + 7);
    if ((tempVersionOffset = navUserAgent.indexOf("Version")) != -1) {
      browserVersion = navUserAgent.substring(tempVersionOffset + 8);
    }
  } else if ((tempVersionOffset = navUserAgent.indexOf("Firefox")) != -1) {
    browserVersion = navUserAgent.substring(tempVersionOffset + 8);
  } else if ((tempNameOffset = navUserAgent.lastIndexOf(' ') + 1) < (tempVersionOffset = navUserAgent.lastIndexOf('/'))) {
    browserVersion = navUserAgent.substring(tempVersionOffset + 1);
  }

  // trim version
  if ((tempVersion=browserVersion.indexOf(";")) != -1)
    browserVersion=browserVersion.substring(0, tempVersion);
  if ((tempVersion=browserVersion.indexOf(" ")) != -1)
    browserVersion=browserVersion.substring(0, tempVersion);
  
  return browserVersion;
}

Browser language

Capturing browser language is another very easy one. You can simply take the language property from the navigator object.

function() {
  return navigator.language;
}

Device Category

There are many snippets floating around the internet. Yet, it’s important to know that there is no perfect way to categorize devices. The following snippet should take you far and it will categorize 80% to 90% of devices properly. You could easily replace this with another (often much longer) snippet, but for readability, let’s go with this one:

function() {
    ua = navigator.userAgent;
    if (/(tablet|ipad|playbook|silk)|(android(?!.*mobi))/i.test(ua)) {
      return "tablet";
    }
    if (
      /Mobile|iP(hone|od|ad)|Android|BlackBerry|IEMobile|Kindle|Silk-Accelerated|(hpw|web)OS|Opera M(obi|ini)/.test(
        ua
      )
    ) {
      return "mobile";
    }
    return "desktop";
}

Operating System

The same comments apply to the operating system. You can make it a lot granular by choosing another snippet, but I decided to go with this slightly modified one that simply looks at the four broad categories of operating systems.

function() {
  osName = "Unknown OS"; 
  if (navigator.userAgent.indexOf("Win") != -1) {
    osName = "Windows"; 
  }
  if (navigator.userAgent.indexOf("Mac") != -1) {
    osName = "Mac"; 
  }
  if (navigator.userAgent.indexOf("Linux") != -1) {
    osName = "Linux";
  }
  if (navigator.userAgent.indexOf("Android") != -1) {
    osName = "Android"; 
  }
  if (navigator.userAgent.indexOf("like Mac") != -1) { 
    osName = "iOS"; 
  }
  return osName;
}

User Agent

The user agent provides a lot of input to the device variables described above. However, it doesn’t hurt to store the user agent in its entirity. It is a good way to identify crawlers such as GoogleBot, IndeedBot, etc. You’ll thank me later for storing this once you start filtering out bot traffic from your data.

Create another Custom Javascript variable and put the following code in it.

function() {
  return navigator.userAgent;
}

Google Analytics Cookie Values

True, the goal was to work completely independent from Google Analytics. But the possibility is very real that you’ll still be using it, in parallel wil your clickstream solution. So… Why? Why would you like to capture the GA cookie values? Well, for multiple reasons:

  • It’s a convenient way to attribute behavior to one and the same user/device.
  • If some dimensions are missing in your clickstream solution, you could blend the data with data you pulled from Google Analytics — given that you also store the cookie values as custom dimensions in GA.

So let’s do it. The GA cookie that is stores for two years after the last hit is the _ga cookie. Another one is the _gid cookie, which lasts 24 hours after the last hit. It’s not 1:1 comparable to how GA is configured, but you could say that the first one can be used to track the userm while the second one can be user to track the visit.

Of course, both cookie values can be tracked in the same way. Simply replace ‘_ga’ with ‘_gid’ in the following snippet and add both to a Custom JavaScript variable. Caveat: some browser extensions related to privacy block both cookies. It can’t hurt to create your own cookie(s).

function() {
  var name = '_ga' + "=";
  var decodedCookie = decodeURIComponent(document.cookie);
  var ca = decodedCookie.split(';');
  for(var i = 0; i <ca.length; i++) {
    var c = ca[i];
    while (c.charAt(0) == ' ') {
      c = c.substring(1);
    }
    if (c.indexOf(name) == 0) {
      return c.substring(name.length, c.length);
    }
  }
  return "(none)";
}

Putting everything together: Stream the data

Now that we created all these variables in Google Tag Manager, it’s time to put all of them together and send them to the webhook of choice. In the example below, we create a simple pageview event that is triggered on DOM Ready.

Nothing stops you to create a multitude of events that you can pick up values from the dataLayer. You are no longer constrained to an event category, action, and label. You can add as many dimensions as you want.

<script>
  ft = {
      'event': 'pageview',
    
      // source
      'utm_campaign': '{{utm_campaign}}',
      'utm_source':  '{{utm_source}}',
      'utm_medium':  '{{utm_medium}}',
      'referrer': '{{Referrer}}',
      
      // page
      'hostname': '{{Page Hostname}}',
      'protocol': '{{protocol}}',
      'pagePath': '{{page path (no utm)}}',
      'pageTitle': '{{page title}}',
      
      // ga cookies
      'gid': '{{gid cookie}}',
      'ga': '{{ga cookie}}',
      
      // device
      'userAgent': '{{user agent}}',
      'browserLanguage': '{{browser language}}',
      'deviceCategory': '{{device category}}',
      'operatingSystem': '{{operating system}}',
      'browser': '{{browser}}',
      'browserVersion': '{{browser version}}'
  }
  
  fetch('URL_OF_YOUR_WEBHOOK', {
    method: 'post',
    headers: {
      'Accept': 'application/json, text/plain, */*',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(ft)
  })

</script>

You are correct: not all GA dimensions are present. For example: what about country? That is a dimension that is inferred from the IP address. You could get the IP address in two ways: create your own server-side service, or use a service such as ipstack which returns the IP and geolocation data, all at once.

Finally, you are right that we can streamline our solution even more by creating a tag template. Make sure that that your webhook accepts GET requests, because POST requests are not supported (yet).

In the next blog post, you’ll learn how to set up a webhook in Fivetran that stores the data in a managed PostgreSQL database.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

1 thought on “Create your own event-based web analytics tag”

  1. Of course, both cookie values can be tracked in the same way. Simply replace ‘_ga’ with ‘_gid’ in the following snippet and add both to a Custom JavaScript variable

Leave a Reply

Your email address will not be published. Required fields are marked *