From May 2018 the new General Data Protection Regulations (GDPR) will come into force in the European Union, causing all marketers and data engineers to re-consider how they store, transmit and manage data – including Google Analytics.
If your company uses Google Analytics, and you have customers in Europe, then this guide will help you check compliance.
Imagine a customer calls your company and using the right of access asks what web analytics you hold on them. If it is impossible for anyone at your company (or from your agencies) to identify that customer in GA, then the other right of rectification and right of erasure cannot apply.
Since it is not possible to selectively delete data in GA (without deleting the entire web property history) this is also the only practical way to comply.
The tasks needed to meet depends on your meaning of ‘impossible to identify’!
Any customer data sent ‘in the clear’ to GA is a clear break of their terms, and can result in Google deleting all your analytics for that period.
This would include:
- User names sent in page URLs
- Phone numbers captured during form completion events
- Email addresses used as customer identifiers in custom dimensions
If you’re not sure, our analytics audit tool includes a check for all these types of personally identifiable information.
You need to filter out the names and emails on the affected pages, in the browser; applying a filter within GA itself is not sufficient.
But I prefer a belt-and-braces approach to compliance, so you should also look at who has access to the Google Analytics account, and ensure that all those with access are aware of the need not to capture personal data and GDPR more generally.
You should check your company actually owns the Google Analytics account (not an agency), and if not transfer it back.
At the web property level, you should check only a limited number of admins have permission to add and remove users, and that all the users only have permission to the websites they are directly involved in.
Or you could talk to us about integrations with your internal systems to automatically add and remove users to GA based on roles in the company.
Other areas which could possibly be personally identifiable and you may need to discuss are:
- IP addresses
- Postcodes/ZIP codes
- Long URLs with lots of user-specific attributes
The customer’s IP address is not stored by Google in a database, or accessible to any client company, but it could potentially be accessed by a Google employee. If you’re concerned there is a plug-in to anonymise the last part of the IP address, which still allows Google to detect the user’s rough location.
ZIP codes are unlikely to be linked to a user, but in the UK some postcodes could be linked to an individual household – and to a person, in combination with the web pages they visited. As with IPs, the best solution is to only send the first few digits (the ‘outcode’) to GA, which still allows segmenting by location.
Long URLs are problematic in reporting (since GA does not allow more than 50,000 different URL variants in a report) but also because, as with postcodes, a combination of lots of marginally personal information could lead to a person. For example, if the URL was
This could allow anyone viewing those page paths in GA to identify the person.
The solution is to replace long URLs with a shortened version like
And for bonus points…
Doing so would protect you from a web-savvy user in the future who wanted to know what information has been stored against the client ID used in his Google cookie. I feel this client ID is outside the scope of GDPR, but guaranteeing that the user on GA can be linked to opt-in consent of the cookie will help protect against any future data litigation.
The final area of contention is hashing emails. This is the process used to convert a plain email like ‘firstname.lastname@example.org’ into a unique string like ‘uDpWb89gxRkWmZLgD’. The theory is that hashing is a one-way process, so I can’t regenerate the original personal email from the hash, rendering it not personal.
The problem is that some common hashing algorithms can be cracked, so actually the original email can be deduced from a seemingly-random string. The result is that under GDPR, such email hashes are considered ‘pseudonymized’ – the resulting data can be more widely shared for analysis, but still needs to be handled with care.
For extra security, you could add a ‘salt’ to the hashing, but this might negate the whole reason why you want to store a user email in the first place – to link together different actions or campaigns from the same user, without actually naming the user.
There are ways around that strike a compromise. Contact Littledata for a free initial consultation or a GDPR compliance audit.