The issue with the referral spam in Google Analytics exploded in May when we saw an average of 620 spam sessions per GA property and just the other week, I saw an account where spam accounted for 95% of the traffic!
Spam referrals are greatly skewing your Google Analytics traffic and becoming a headache for a larger number of people.
Why are these spam sessions appearing in your Google Analytics traffic? To get you click through to their site and ads (never ever do that, by the way). By targeting thousands of GA accounts like this, you can imagine how much traffic they get from those more curious about their new source of visits.
There are two different types of spam referrals you are getting:
- Ghost referrals send fake traffic to your GA account by “attacking” random GA property IDs.
- Crawler referrals crawl your website to leave a mark in your traffic.
The spam referrals are getting more persistent and clever by targeting other non-referral reports, like www.event-tracking.com appearing in events.
How can you tell it’s spam?
By seeing unusual activity, odd referral sources, substantial changes in your metrics, and lots of (not set) values in various dimensions, eg hostname and language.
So how do you remove spam referrals from your Google Analytics traffic?
There are two filters you need to set up to remove both ghost and crawler spam referrals.
Filters change your traffic permanently so if you don’t have an unfiltered view of your data, then create one now. It’s a good practice to have an unfiltered view that you don’t modify and it allows you to check your filters are working correctly.
We are also working on our own spam filter tool to help people get rid of pesky spam referrals with just a few clicks of a button. We have already released a beta version via our Littledata analytics reporting tool and are developing it further to make it more robust and comprehensive.
But if you’d rather do it yourself, keep reading.
Create a filter to include valid hostnames
Since ghost referrals never actually visit the site, the best way to get rid of them is by creating a valid hostname filter. This filter will allow visits from “approved” websites that you consider valid.
First, you will need to identify your valid hostnames by going to the report in Audience > Technology > Network > Hostname.
Hostnames report shows domains where your GA tracking code was fired and helps to troubleshoot unusual traffic sources.
Valid hostnames on the list will be the websites where you inserted the GA tracking code, use additional services, eg transactions, or reliable sites used by people to access your site, eg Google Translate.
Your reliable hostnames could look like this:
translate.googleusercontent.com (user accessing your site via Google Translate)
webcache.googleusercontent.com (user accessing translated cached version of your site)
Any other website that you do not recognise or looks suspicious, you can safely assume to be a hostname you want to exclude.
Beware of any domains that appear as “credible sources”, eg Google, Amazon and HuffingtonPost. They are used to mask the spammers.
If you see (not set) hostname on your list, this could be because you’re sending events to GA that don’t have pageviews, for example tracking email opens and clicks. If you are sure you are not sending any such events to GA, you can also exclude any (not set) hostnames.
Now that you have got your valid hostnames, you need a regular expression for a filter that will include your valid hostnames (and thus, exclude all other fake ones).
It’ll look like this:
In the regex above, the vertical bar | separating each domain means OR. This will match any part of the string, so ‘yoursite’ will match ‘blog.yoursite.com’ as well as ‘www.yoursite.com’.
You can test your regex at http://regexpal.com/ by inserting your expression at the top and all the URLs at the bottom. All matches will be highlighted so you can see straightaway whether you have included all your valid hostnames correctly.
Before adding the valid hostname filter in the settings, test it with an advanced segment.
The results on the screen should now be only of your valid hostnames and without all the spammers.
If all looks good, create a filter by going to Admin > View > Filters > New Filter. This will add a filter for that specific view only. If you want to add the same filter to more than one view, then check the details below.
Select ‘Include’, pick a custom filter and select ‘hostname’ from the filter field menu.
Now enter your regex into filter pattern field and click save.
Want to apply a filter to multiple views?
Then go to Admin > Account > All Filters > New Filter.
The setup is exactly the same as above, except now you will see a section at the bottom titled ‘Apply Filter to Views’.
Select views you want to apply the filter to and move them to the right hand side box by clicking button ‘add’ in the middle.
You’re all set so click save.
Add a filter to exclude campaign source
Some of the known spam referrals are free-social-buttons, guardlink.org, 4webmasters.org and, most recently, the ironically named howtostopreferralspam.eu.
Excluding spam referrals with campaign source filter is one of the most commonly mentioned methods online.
This filter will exclude any referrer spam from the moment you add the filter (not from your historical data). The downside is that every time you have a new spam referral appear in your Google Analytics data you will have to add them to the existing filter, or create a new one if you’ve ran out of character space (allows only 255 characters).
You can identify your spam referrals by going to referrals report found in Acquisition > All Traffic > Referrals.
To save you some time, I have included the regex’s we use below so you can copy them. Make sure you double check your referrals report against our list to see if there are any that haven’t appeared in our reports yet. If you find a source not listed below, simply add it to the end and let us know in the comments.
Similarly to setting up the filter to include valid hostnames only, now you need to add a filter to exclude spam referrals.
We use the following regular expressions to filter out spam (yes, that’s four filters):
The reason majority of the websites above do not have org/com/etc is that for these sites I have concluded that there are no other genuine sites with similar site names (or none that I could find) that would send our site traffic. So it is safe to exclude these sites by name only.
For example, there are many sites with adviceforum in their name so to avoid excluding any potentially genuine sites that are called adviceforum, I only exclude the one spam referral I saw in my traffic – adviceforum.com.
If you notice that you have referral traffic from addons.mozilla.org but don’t actually have an addon on Mozilla, then you should add addons.mozilla.org (more commonly known as ilovevitaly) to the list above in this format – addons.mozilla.org
Select Campaign Source in the filter field menu and enter your regex into the filter pattern field. Repeat the process until you have got all four (or more) filters created.
This will help to clean up your Google Analytics data but you have to keep checking for any new spam referrals to add to the exclude filter.
You can use advanced segments to view your historical reports without spam referrals.
If you need help with any of the above or have further questions, don’t hesitate to let me know in the comments.