Automate Everything w/ Bash, Linux & Command Line
  1. Google Analytics PDF Tracking Made Easy

    Tracking the quantity of PDF downloads is a very common wish for business to business and/or lead generation campaigns. It makes sense right? You understand that if a visitor is requesting a PDF then that visitor wants to learn more and is interested in what they’ve learned from your site so far.

    The problem with trying to track PDF downloads as a conversion point or a goal is that a PDF document is different than a normal page. You cannot insert Google Analytics code in that PDF to have it track like you would another page. That means you won’t be able to set it up as a goal in your Analytics profile.

    The Good News

    The good news is that it’s simple enough to setup Google Analytics Event Tracking. In a nutshell, what this will do it record an “event” each time the PDF link is clicked by a user. Here’s Google’s overview of how to set it up.

    If you’re too lazy to read through all the documentation provided by Google, here’s the tl;dr:

    Before adding the onClick, the link would look like this:

    <a href="/some/pdf/file.pdf">Download this PDF</a>
    

    After adding the onClick, it would look something like this:

    <a onclick="_gaq.push(['_trackEvent', 'PDF', 'Download', '/some/pdf/file.pdf']);"" href="/some/pdf/file.pdf">Download this PDF</a>
    

    Then, once the link is clicked, you will record an “Event” in Google Analytics with the category equal to PDF, the action equal to Download and the label equal to /some/pdf/file.pdf. Pretty simple right?

    The Bad News

    Now it’s time for the bad news. I’m typically in the consultant position. I’m trying to convince a site owner, marketing manager or webmaster why they should go through the trouble to tag each and every “a href” PDF link with this onClick attribute, each of which needs to be customized so we can measure appropriately. This is a lot of work. Typically, the project doesn’t even start. Other times when it tagging these links becomes a project, it rarely is done perfectly and almost never is kept up with. This leads to inconsistent reporting.

    The only thing worse than no tracking is inaccurate tracking. There’s nothing worse to a marketing professional (who’s trying to justify one’s own existence) than having to explain that the reporting is invalid because there were errors made in the setup of the analytics. However, IT HAPPENS ALL THE TIME. It’s insanely frustrating.

    This blog, after all, is titled “Automate Everything”. Hopefully my solution to this problem won’t disappoint. I’ve been learning Javascript, more specifically JQuery, over the past several months. I was thinking about how to solve this problem. How can I use my new skills to make it insanely easy and efficient for people to add event tracking for Google Analytics to all the PDF links on their site? I’m happy to say, I’ve figured it out.

    PDF Event Tracking Solution

    My solution will allow you to track every PDF link on your site by just pasting a few lines of JQuery in the footer of your site. Here’s my solution:

    <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
    <script>
        $("a[href$='pdf']").each(function(index) {
          pdfLabel = $(this).attr('href');
          pdfOnClick = "_gaq.push(['_trackEvent', 'PDF', 'Download', '" + pdfLabel + "']);";
          $(this).attr("onClick", pdfOnClick);
        });
    </script>
    

    You should be able to place this code in the footer of almost any site and when the page loads, it will go through all the <a href="" tags and look for links with the href ending in .pdf. When it finds one, it simply adds the onClick attribute to perform the event tracking in Google Analytics.

    If your site already includes the JQuery framework then there’s no need to repeat it. Simply omit the opening <script> tag and include the rest like so…

    <script>
        $("a[href$='pdf']").each(function(index) {
          pdfLabel = $(this).attr('href');
          pdfOnClick = "_gaq.push(['_trackEvent', 'PDF', 'Download', '" + pdfLabel + "']);";
          $(this).attr("onClick", pdfOnClick);
        });
    </script>
    

    The only thing you need to keep in mind is that this code needs to be placed as far down on the page as possible. Once this code executes, it will only have an effect on the page elements (DOM elements) that have already been loaded by the browser.

    This is a very quick, highly efficient way to enable tracking of your PDF downloads using event tracking in Google Analytics. I’m actually quite surprised that Google doesn’t offer suggestions like this on their help pages.

    I hope you’ve found this helpful. Please leave a comment if you have questions and share it if you have found it to be helpful. Thanks for reading.

    Happy automation!

     
  2. Google Analytics without JavaScript or Cookies

    The title is correct. You can use Google Analytics to track visitors in circumstances where JavaScript execution is not possible. If you think about it I’m sure you can come up with a handful of situations that you currently do not track, which will be possible with what I’m about share with you. This will also work for visitors who have cookies completely disabled.

    My opinion is that visitors who have disabled JavaScript manually (or with the help of an add-on like NoScript) are doing so because they don’t want to be tracked or for security reasons. If they’re doing that then they most likely disable cookies as well. I mention this because I believe it’s up to you do decide whether or not you should track visitors in these circumstances.

    I think you can guess where I stand on the topic, however, I do hope that the ‘Do Not Track’ header becomes widely implemented. If it does, there will not be a grey area anymore because it’s an explicit user choice to not be tracked. That will be a choice I will respect without question.

    Google Analytics without JavaScript is Difficult

    If you’re lacking patience, want an easy solution, don’t like to read and/or just don’t care about how Google Analytics works then this post isn’t for you. Maybe I should have warned you sooner. But, accept my apology for not doing so and check out a service like this.

    If Your’re Adventurous, Keep Reading

    Here are a few situations where you could find Google Analytics tracking useful, only if you could do so without the use of JavaScript.

    1. Track click through rates of email campaigns (all email software and webmail clients strip JavaScript). You need to know the quantity of opens to calculate CTR.
    2. Track users who do not have JavaScript enabled.
    3. Most older mobile phone browsers do not have JavaScript.

    I’m sure there are others. Those three are the first that come to mind. Leave others in the comments if you like.

    How it’s Done

    Disclaimer: I’m publishing this because I hope it will be useful. There’s a small chance I’m full of shit. I don’t think so, but it’s possible.

    The key to understanding how to track using Google Analytics without the use of JavaScript is to first dissect what’s happening with Google’s JavaScript. After successfully doing so, you should have the information needed to successfully recreate what happens without using JavaScript. More on how to recreate that behavior later.

    First, let me share how you can see what the GA JavaScript is doing. You can observe the those actions yourself by using Chrome’s development tools or Firebug in Firefox and watching the network calls made during page load on a site with Google Analytics installed. Try it on this page if your curious. The instructions below are for Google Chrome.

    1. Press Ctrl + F12 on the test page.
    2. Switch to the 3rd tab labeled “Network”.
    3. Press F5 to refresh the page. You’ll see calls for all the page resources required to render the page.
    4. Filter so you see just the images. Do this by clicking on the link in the bottom toolbar labeled “Images”. Look for a request called __utm.gif and click on it.
    5. By default, it will open to the “Preview” tab. Switch to the “Headers” tab.

    You’re now looking at the actual GET request and parameters which transmit all the tracking information to Google Analytics. This is the default information sent to track a page view, and much of it isn’t required. Here’s a summary:

    A GET request is made for http://www.google-analytics.com/__utm.gif. If you’re following along on the homepage of my site, you’ll see the following parameters and values. I’ve pasted them URL decoded (easier to read) and with comments describing what each represents.

    utmwv:5.2.6 // Google Analytics code version
    utms:2  // I'm honestly not sure, but it isn't required. I've heard it's used to count requests per session. I'd love to know if you know.
    utmn:1509196652 // random number generated to make sure the gif isn't cached.
    utmhn:automateeverything.tumblr.com // the hostname
    utmcs:UTF-8 // character encodeing (not required)
    utmsr:1680x1050 // size of the display (not required)
    utmvp:1218x504 // size of the browser window (not required)
    utmsc:24-bit // color depth (not required)
    utmul:en-us // language (not required)
    utmje:1 // Java enabled, 1=yes 0=no (not required)
    utmfl:11.2 r202 // Flash version (not required)
    utmdt:Automate Everything w/ Bash, Linux & Command Line // Page title tag (not required)
    utmhid:405374938 // A random number used to link Analytics GIF requests with AdSense. (not required)
    utmr:- // referer, ~=none
    utmp:/ // URI
    utmac:UA-29271731-1 // Google Analytics Profile ID
    utmcc:__utma=234084878.479851276.1333418536.1333418536.1333418536.1;+__utmz=234084878.1333418536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); // Cookie information. More details on this below.
    utmu:q~ // Again, I have no idea what it's used for but it isn't required.
    

    Here’s my source for what the parameter/value pairs represent. Scroll down about 75% of the way to where it says “The GIF Request Parameters”.

    The Google Analytics JavaScript code is used to detect all the values (and more) used in the parameters above and then it creates the __utm.gif GET request. So you see, by default, if JavaScript cannot execute then Google Analytics will not track. That is, unless you do your own detection and generation of the parameter values (which will have to be done server-side when the page is requested) then include the 1px image on the page you want to track! That way, when the images for the page load, the browser will send a GET request and the tracking information will be sent to Google Analytics just as if Google’s JavaScript had generated it.

    Before we can do that, there’s one parameter for which I haven’t explained the values. Actually, I couldn’t find an explanation at all (from Google) about what the utmcc= parameter is used for, besides the fact that it’s a representation of the stored cookies. Here it is again from the example above:

    utmcc:__utma=234084878.479851276.1333418536.1333418536.1333418536.1;+__utmz=234084878.1333418536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
    

    After some Googling, I pieced together the following information about what each part means. This was the hardest part about getting all this to work. I’ve replaced the actual values from the example above with descriptive names in hopes that it will be easier to understand what each represents.

    utmcc:__utma=DOMAINHASH.RANDOMNUM.TIMEFIRSTVISIT.TIMEPREVIOUSVISIT.CURRENTTIME.NUMBEROFSESSIONS;+__utmz=DOMAINHASH.CURRENTTIME.NUMBEROFSESSIONS.NUMBEROFSOURCES.utmcsr=SOURCE|utmccn=CAMPAIGN|utmcmd=MEDIUM;
    
    DOMAINHASH // A static number that is unique for each site. Find it by inspecting the '__utma' parameter value on your site.
    RANDOMNUM // It's a randome number, you can generate this in any way you choose.
    TIMEFIRSTVISIT // Time of first visit represented as seconds since 1970-01-01 00:00:00 UTC
    TIMEPREVIOUSVISIT // Time of the visitors previous visit represented as seconds since 1970-01-01 00:00:00 UTC
    CURRENTTIME // Current time represented as seconds since 1970-01-01 00:00:00 UTC
    NUMBEROFSESSIONS // The count of total sessions for this visitor.
    NUMBEROFSOURCES // A count of the number of different sources the visitor has used to find your site.
    

    The date command provides a really easy way to get a current time stamp in seconds since 1970-01-01 00:00:00 UTC. Here it is:

    date -u +%s
    

    You can also generate a random number of proper length by running the following command:

    < /dev/urandom tr -cd 0-9 | head -c 9
    

    Conclusion

    Now that you have an adequate understanding of the minimum amount of information needed to recreate the GET request for __utm.gif, it’s completely up to you to figure out how you’re going to generate the values and keep track of session counts, source counts, detect referral sources, etc. If you’re using PHP on your site, you may want to have a look at Server Side Google Analytics. That isn’t an endorsement, since I don’t have any experience using it, but it was one of the tools I came across while doing my research.

    I started working on this because I wanted to track open rates in Google Analytics for my email campaigns. I’ve been able to work out the kinks and confirm that this absolutely works when combined with Google Analytics Event Tracking. It’s likely that I will post the full details of that project as a follow up to this post, so make sure to check back here or follow me on Google+ for updates.

    Please post your questions and I’ll do my best to help. As always, happy automation!