They say data is the new oil.
Like oil, data is a dark resource, hidden beneath the surface, that brands can tap into and get rich from.
But if THAT’S true then many of the DTC brands I come across have the equivalent of a rusty hand pump on their data well. These Shopify brands have vast data resources untapped beneath their few acres. And they might need a bit of data fracking to get the good stuff out of the ground.
I’m talking about first party data
First-party data is all the information a brand captures about its customers and potential customers via direct interactions with those customers. It’s not purchased from third parties or gathered in dubious ways.
Customers grant brands the right to collect first-party data in their terms and this customer event data belongs to the brand. It holds rich insights if it can be combined into a consistent customer journey.
The challenge for brands trading on Shopify is that they have outsourced most of their technology to Shopify – including the collection of event data on their customers – and sometimes forget the importance of collecting and controlling the data for themselves.
What first party event data could a Shopify store capture?
Every online store has its own unique products and flows, but some things are the same across brands. One of those key elements is behavioral event data. As Amplitude explains, behavioral analytics is the process of collecting and analyzing data from actions performed by users of a digital product, such as an app or website.
The behavioral event data I’d recommend collecting on any an ecommerce store includes:
- What online campaigns users interacted with
- What onsite promotions or widgets users saw
- What products users browse and click on
- What users searched for and found (or didn’t find)
- What users added to cart and checkout
- What users bought (including all the product metadata and pricing)
- Whether users returned, cancelled or had the order fulfilled
Event data is much more useful in generating insights than customer state data. Shopify provides open APIs from where a store (or an app on behalf of the store) can export customer state: a list of the customers or the orders they made – but this does not help us understand what brought the customer to buy, or how to encourage them to buy more.
For that analysis we need customer event data – all the customer behavior over time.
The limitations of Shopify Analytics
Shopify captures many customer events to display within the Shopify Analytics reports, including some of the first party data points above. Shopify Analytics is a simple reporting tool for newer brands to get started with, but isn’t ideal for larger brands or those looking to really scale with ecommerce analytics.
Of course, everyone should pay attention to their Shopify dashboard, no matter how big or small they are. But you should also understand the limitations of Shopify Analytics so you can make the most of your first-party data on Shopify.
Firstly, there is no way to get the Shopify Analytics data out. I’ve heard rumours of an Analytics API in the works, but it certainly won’t include user-level data for deeper analysis. Without that full export a brand can’t really claim to be owning the data.
And the Shopify Analytics reporting is still very basic – you can’t build custom reports, or slice and dice the data like you can in a Google Analytics, and this has become even more powerful in GA4, the newest version of Google Analytics. Yes you can apply report filters on Shopify Plus, but this is far off the segmentation abilities a serious data analyst would need.
The next problem is that the marketing engagement and marketing attribution capture is very simplistic. Shopify tracks pageviews and the UTM parameters that bring users to the landing page, and recently launched multiple attribution models – but there’s limited insights on what users click on, browse and search for when they get to your site.
Finally, there’s no data capture on what happens in storefront apps connected to your store – which are an important part of the customer experience in many cases.
As an aside, Shopify did try to solve this last problem by providing a Marketing Events API for app partners to push data back into Shopify – but it never got traction because the other limitations above meant data-sophisticated Shopify Plus brands already collect data elsewhere.
Avoiding data silos
The classic problem with company data usage is that the company gathers data points but they exist in silos in different tools and can’t be easily joined together.
On an ecommerce store that typically means that (going back to my list of common data points) the data is split between:
- What online campaigns users interacted with – captured in Facebook Ads or Klaviyo
- What onsite promotions users saw – captured in promotion app
- What products users browse – not tracked
- What users searched for – captured in search app
- What users added to cart – partially tracked in Shopify
- What users bought – tracked in Shopify
- Whether users returned, cancelled or had the order fulfilled – tracked in Shopify
To truly get value from the customer data we need a way of bringing as much of this into one data store (not necessarily a data warehouse) and be able to attribute the events back to a customer or anonymous user.
Without breaking down the data silos you can’t answer basic questions such as which campaigns lead to sales, or how on-site engagement affects customer retention.
What about zero party data?
Zero party data is a phrase coined for apps that ask questions directly of the user, rather than inferring it from their behaviour. A good example is a post-purchase attribution survey – “How did you hear about my brand?”. A good example is Fairing, offering post-purchase surveys for many Shopify Plus brands.
I see this zero-party data as all part of the first party data set a brand owns, and should be tracked alongside other customer journey data points, and ideally sent to the same destination. We’ve partnered with Source Medium to make this zero party data accessible alongside the first party data tracked by Littledata.
Google Analytics + BigQuery as a data store
I’ve advocated for brands to use Google Analytics as an independent data store for many years. GA is free, powerful and with a standard ecommerce data schema to simplify reporting.
Doesn’t that just lock you into Google’s walled data garden rather than Shopify’s?
No. By using the free connector to Google BigQuery you can stream user-level data into a permanent data store and export it to any other data warehouse or data science platform when a need arises.
I see BigQuery as a data insurance policy for brands. Even if you don’t have the analyst resources to write SQL and build data models now, the cost to store all of your customer events is so trivial – less than 0.01% of your brand’s revenue – as to be not worth questioning.
You need to start streaming data into BigQuery now, because with Google Analytics’s APIs there is no way to get out user-level data historically.
From Segment to a data warehouse
Some of the brands we work with that capture customer data outside of their Shopify store choose to use Segment to stream the data into another data warehouse such as Snowflake or AWS Redshift.
This may be a faster route if your company already uses one of these data warehouses – although it is possible to export BigQuery data via Google Cloud storage into another warehouse.
Segment also has the advantage of allowing you to store personally identifiable information about all the events where customers are logged in (e.g. the email address connected to the order). Such information is prohibited by Google Analytics.
Segment provides connectors for many other marketing, CRM and reporting tools to ensure you are getting the same customer data across your business, such as Amplitude, Mixpanel, Klaviyo and Braze.
Future proofing for AI
Most of the advances in AI so far in 2023 has been for ChatGPT and other models trained on unstructured data. Yet I believe it’s inevitable that we see similar leaps in sophistication for structured data analysis over the next year.
Structured data – of the kind you would capture in GA + BigQuery – is easy for machine learning to work with, once it is turned into graph data. Graph data represents the data points as relationships: a user who clicked on an ad then views a page and goes on to purchase.
As the AI assistants improve I’d expect you to be able ask questions like:
- What is the most profitable way for me to acquire customers?
- If I increased my product line in this category would I sell more?
- Should we increase advertising in Germany?
And based on your own first party data the AI assistant can give you a definitive answer.
But what if your competitors can use this magic, because they bothered to get their customer data into a comprehensive graph, but your brand cannot? Can you afford to take that risk?
Sharing first party data with your marketing platforms
The other reason you need to own your customer data is so that you can share it into the marketing channels to acquire new customers.
Shopify launched Shopify Audiences in the US to enable brands to draw on pooled customer data across Shopify to build advertising audiences, but that creates even more lock in to Shopify.
As customer acquisition becomes ever more competitive it’s imperative that you can use all of the signals you gather from customer behaviour, either to build lookalike audiences for similar buyers or to exclude customers who have already purchased.
Yet having a backup system from your own data store future-proofs your marketing needs.
Let’s imagine your company switches to a new email marketing tool next year. Could you stream all of the historical customer events into there to provide instant targeting for the emails? Or would you have to wait months for data to collect via their Shopify app?
There are many valuable uses of accurate first party data – from reporting to audience building to predictions – but the foundation of all of those is to gather the event data in a way you can control and tap into.
Shopify is an excellent ecommerce platform to scale an online store on, but it doesn’t provide the level of access to first party event data – or even capture all of your customer’s interactions – to be a data platform.
I also think data strategy is one area you can’t outsource to any tech partner. Carefully consider what your brand’s future data needs might be, and how you can prepare for that now.
Unlike oil, data won’t just sit in the ground until you are ready to tap it. You need to store it in the right format now so you – or a superpowered AI assistant – can access it later.