Is Google Analytics compliant with GDPR?

GDPR compliance

From May 2018 the new General Data Protection Regulations (GDPR) will come into force in the European Union, causing all marketers and data engineers to re-consider how they store, transmit and manage data – including Google Analytics.

If your company uses Google Analytics, and you have customers in Europe, then this guide will help you check compliance.

The rights enshrined by GDPR relate to any data your company holds which is personally identifiable: that is, can be tied back to a customer who contacts you. The simplest form of compliance, and what Google requires in the GA Terms of Use, is that you do not store any personally identifiable information.

Imagine a customer calls your company and using the right of access asks what web analytics you hold on them. If it is impossible for anyone at your company (or from your agencies) to identify that customer in GA, then the other right of rectification and right of erasure cannot apply.

Since it is not possible to selectively delete data in GA (without deleting the entire web property history) this is also the only practical way to comply.

The tasks needed to meet depends on your meaning of ‘impossible to identify’!

Basic Compliance

Any customer data sent ‘in the clear’ to GA is a clear break of their terms, and can result in Google deleting all your analytics for that period.

This would include:

  • User names sent in page URLs
  • Phone numbers captured during form completion events
  • Email addresses used as customer identifiers in custom dimensions

If you’re not sure, our analytics audit tool includes a check for all these types of personally identifiable information.

You need to filter out the names and emails on the affected pages, in the browser; applying a filter within GA itself is not sufficient.

But I prefer a belt-and-braces approach to compliance, so you should also look at who has access to the Google Analytics account, and ensure that all those with access are aware of the need not to capture personal data and GDPR more generally.

You should check your company actually owns the Google Analytics account (not an agency), and if not transfer it back.

At the web property level, you should check only a limited number of admins have permission to add and remove users, and that all the users only have permission to the websites they are directly involved in.

Or you could talk to us about integrations with your internal systems to automatically add and remove users to GA based on roles in the company.

Try Littledata free for 30 days

Full Compliance

Other areas which could possibly be personally identifiable and you may need to discuss are:

  • IP addresses
  • Postcodes/ZIP codes
  • Long URLs with lots of user-specific attributes

The customer’s IP address is not stored by Google in a database, or accessible to any client company, but it could potentially be accessed by a Google employee. If you’re concerned there is a plug-in to anonymise the last part of the IP address, which still allows Google to detect the user’s rough location.

ZIP codes are unlikely to be linked to a user, but in the UK some postcodes could be linked to an individual household – and to a person, in combination with the web pages they visited. As with IPs, the best solution is to only send the first few digits (the ‘outcode’) to GA, which still allows segmenting by location.

Long URLs are problematic in reporting (since GA does not allow more than 50,000 different URL variants in a report) but also because, as with postcodes, a combination of lots of marginally personal information could lead to a person. For example, if the URL was

mysite.com/form?gender=female&birthdate=31-12-1980&companyName=Facebook&homeCity=Winchester

This could allow anyone viewing those page paths in GA to identify the person.

The solution is to replace long URLs with a shortened version like

mysite.com/form

And for bonus points…

All European websites are required to get visitors to opt in to a cookie policy, which covers the use of the GA tracker cookie.

But does your site log whether that cookie policy was accepted, by using a custom event?

Doing so would protect you from a web-savvy user in the future who wanted to know what information has been stored against the client ID used in his Google cookie. I feel this client ID is outside the scope of GDPR, but guaranteeing that the user on GA can be linked to opt-in consent of the cookie will help protect against any future data litigation.

The final area of contention is hashing emails. This is the process used to convert a plain email like ‘me@gmail.com’ into a unique string like ‘uDpWb89gxRkWmZLgD’. The theory is that hashing is a one-way process, so I can’t regenerate the original personal email from the hash, rendering it not personal.

The problem is that some common hashing algorithms can be cracked, so actually the original email can be deduced from a seemingly-random string. The result is that under GDPR, such email hashes are considered ‘pseudonymized’ – the resulting data can be more widely shared for analysis, but still needs to be handled with care.

For extra security, you could add a ‘salt’ to the hashing, but this might negate the whole reason why you want to store a user email in the first place – to link together different actions or campaigns from the same user, without actually naming the user.

There are ways around that strike a compromise. Contact Littledata for a free initial consultation or a GDPR compliance audit.

Comments 11
  1. With respect, there is no such thing as “hacking” common hashing algorithms. Also, hashing is not encryption. They are two separate things.
    Furthermore, hashing emails, also referred to as pseudonymisation in the GDPR, does not result in exemption from the regulation but rather in easing of restrictions.

    1. Thanks for your accurate comments. I’ve updated the blog post:
      – using “cracking” instead of “hacking”, although they are generally used interchangably
      – using “hashing” not “encryption” (again, a casual mistake on my part)
      – explaining pseudonymization and linking to further details.

    1. As per Waylander’s comments, the user identifier you send to Google would be considered ‘pseudo-anonymous’. In itself, that user behaviour on GA can’t be tied to a real person – but it could be if combined with another data set you hold.

      Maintaining a userID-enabled view is a higher level of risk than sending no user identifier to Google, but only a little bit higher than the cookie ID that Google collects by default – which in the extreme could be considered personally identifiable.

  2. Our business is located in the United States but our website is hosted in Europe, does GDPR still apply? Our customers are located in the U.S. and Canada however.

  3. Hi Edward,

    Interesting post thanks, I am keen to get your thoughts.

    GDPR Recital 30 specifically mentions device ids in scope for profiling.

    Recital 30 – Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.

    In my opinion, the terms “combined” with “unique identifiers” and “other information” to “create profiles” and “identify” are significant and infer that it is only in combination with identifiable information that these snippets or traces of information become relevant.

    We pass a device id currently into our GA. Whilst device id is specifically mentioned in Recital 30, we hold no other information and cannot identify any living person based on it.

    Recital 26 states that … The principles of data protection should not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.

    On the basis of the lawful processing reason being based on a valid legitimate interest, and a device ID that cannot be matched to any living individual (ie there is no other information to combine it with) would you see that this person has a right to object, right to erasure etc?

    Thanks,

    Claire

    1. Great question Claire. Yes, I think an “online identifier” in the context of recital 30 only becomes PII when it is linked with other personal information.

      GA already takes care to limit the device or client ID in reporting: the only place you can see it is in the user explorer report, and that can’t be combined with a secondary dimension.

      So I can’t see how there is a right to erasure if you cannot identify which customer record should be erased.

  4. For ecommerce sites, GA will record the transaction ID. This will also be stored in a company’s CRM/sales database along with information about the user.

    Does this mean that we will have to stop recording the actual transaction ID and use a default value instead?

    1. Hi Stu – see my answer to Claire above. Your CRM is most certainly within scope of GDPR, and you should be limiting who has access to those transaction IDs, as well as giving users the right of erasure to their CRM record.

      But when you erase the transaction ID from the CRM, then it can no longer be linked back to an individual, and so there’s no reason to also erase it in GA. i.e. Your CRM is the lookup table, and without it the GA record is anonymous.

  5. I where exactly in the GA Terms of Use, I can find that “you do not store any personally identifiable information”?

Comments are closed.

Prev
Littledata at Codess
Littledata at codess

Littledata at Codess

I was proud to be invited by Microsoft to speak at their Codess event in

Next
How to set up campaign tagging in Google Analytics (VIDEO)
Set up campaign tagging in Google Analytics

How to set up campaign tagging in Google Analytics (VIDEO)

Google Analytics is only as smart as your tagging

You May Also Like