From May 2018 the new General Data Protection Regulations (GDPR) will come into force in the European Union, causing all marketers and data engineers to re-consider how they store, transmit and manage data – including Google Analytics.

If your company uses Google Analytics, and you have customers in Europe, then this guide will help you check compliance.

The rights enshrined by GDPR relate to any data your company holds which is personally identifiable: that is, can be tied back to a customer who contacts you. The simplest form of compliance, and what Google requires in the GA Terms of Use, is that you do not store any personally identifiable information.

Imagine a customer calls your company and using the right of access asks what web analytics you hold on them. If it is impossible for anyone at your company (or from your agencies) to identify that customer in GA, then the other right of rectification and right of erasure cannot apply.

Since it is not possible to selectively delete data in GA (without deleting the entire web property history) this is also the only practical way to comply.

The tasks needed to meet depends on your meaning of ‘impossible to identify’!

Basic Compliance

Any customer data sent ‘in the clear’ to GA is a clear break of their terms, and can result in Google deleting all your analytics for that period.

This would include:

  • User names sent in page URLs
  • Phone numbers captured during form completion events
  • Email addresses used as customer identifiers in custom dimensions

If you’re not sure, our analytics audit tool includes a check for all these types of personally identifiable information.

You need to filter out the names and emails on the affected pages, in the browser; applying a filter within GA itself is not sufficient.

But I prefer a belt-and-braces approach to compliance, so you should also look at who has access to the Google Analytics account, and ensure that all those with access are aware of the need not to capture personal data and GDPR more generally.

You should check your company actually owns the Google Analytics account (not an agency), and if not transfer it back.

At the web property level, you should check only a limited number of admins have permission to add and remove users, and that all the users only have permission to the websites they are directly involved in.

Or you could talk to us about integrations with your internal systems to automatically add and remove users to GA based on roles in the company.

Full Compliance

Other areas which could possibly be personally identifiable and you may need to discuss are:

  • IP addresses
  • Postcodes/ZIP codes
  • Long URLs with lots of user-specific attributes

The customer’s IP address is not stored by Google in a database, or accessible to any client company, but it could potentially be accessed by a Google employee. If you’re concerned there is a plug-in to anonymise the last part of the IP address, which still allows Google to detect the user’s rough location.

ZIP codes are unlikely to be linked to a user, but in the UK some postcodes could be linked to an individual household – and to a person, in combination with the web pages they visited. As with IPs, the best solution is to only send the first few digits (the ‘outcode’) to GA, which still allows segmenting by location.

Long URLs are problematic in reporting (since GA does not allow more than 50,000 different URL variants in a report) but also because, as with postcodes, a combination of lots of marginally personal information could lead to a person. For example, if the URL was

mysite.com/form?gender=female&birthdate=31-12-1980&companyName=Facebook&homeCity=Winchester

This could allow anyone viewing those page paths in GA to identify the person.

The solution is to replace long URLs with a shortened version like

mysite.com/form

And for bonus points…

All European websites are required to get visitors to opt in to a cookie policy, which covers the use of the GA tracker cookie.

But does your site log whether that cookie policy was accepted, by using a custom event?

Doing so would protect you from a web-savvy user in the future who wanted to know what information has been stored against the client ID used in his Google cookie. I feel this client ID is outside the scope of GDPR, but guaranteeing that the user on GA can be linked to opt-in consent of the cookie will help protect against any future data litigation.

The final area of contention is hashing emails. This is the process used to convert a plain email like ‘me@gmail.com’ into a unique string like ‘uDpWb89gxRkWmZLgD’. The theory is that hashing is a one-way process, so I can’t regenerate the original personal email from the hash, rendering it not personal.

The problem is that some common hashing algorithms can be cracked, so actually the original email can be deduced from a seemingly-random string. The result is that under GDPR, such email hashes are considered ‘pseudonymized’ – the resulting data can be more widely shared for analysis, but still needs to be handled with care.

For extra security, you could add a ‘salt’ to the hashing, but this might negate the whole reason why you want to store a user email in the first place – to link together different actions or campaigns from the same user, without actually naming the user.

There are ways around that strike a compromise. Contact Littledata for a free initial consultation or a GDPR compliance audit.

6 Responses

  • Waylander 1 month ago

    With respect, there is no such thing as “hacking” common hashing algorithms. Also, hashing is not encryption. They are two separate things.
    Furthermore, hashing emails, also referred to as pseudonymisation in the GDPR, does not result in exemption from the regulation but rather in easing of restrictions.

    • mm

      Edward 3 weeks ago

      Thanks for your accurate comments. I’ve updated the blog post:
      – using “cracking” instead of “hacking”, although they are generally used interchangably
      – using “hashing” not “encryption” (again, a casual mistake on my part)
      – explaining pseudonymization and linking to further details.

  • Marco Cilia 3 weeks ago

    And what about userID enabled views?

    • mm

      Edward 2 weeks ago

      As per Waylander’s comments, the user identifier you send to Google would be considered ‘pseudo-anonymous’. In itself, that user behaviour on GA can’t be tied to a real person – but it could be if combined with another data set you hold.

      Maintaining a userID-enabled view is a higher level of risk than sending no user identifier to Google, but only a little bit higher than the cookie ID that Google collects by default – which in the extreme could be considered personally identifiable.

  • myfashionkillz 3 weeks ago

    Our business is located in the United States but our website is hosted in Europe, does GDPR still apply? Our customers are located in the U.S. and Canada however.

Leave a Reply