Keep Your Web Data Clean and Safe to Drink — Part II
In my previous article, I compared web data to water. Just like we can’t survive for an extended period of time without safe drinking water, data-driven businesses can’t survive without good data. In this second part, I will focus on different considerations related to data validation.
If you have a pre-teen daughter, you’ll no doubt have the Hannah Montana song “Nobody’s Perfect” burned into your cerebral cortex. I have more than one tween daughter so just imagine its effect on my brain. Well … no implementation is ever perfect either. In fact, seeking perfection in your implementation can be a dangerous goal. You want your web data quality to be as complete and accurate as possible, but perfection or near-perfection can be costly to achieve.
The higher the level of accuracy that is required, the higher the investment of time and effort that is needed from the business to calibrate its implementation. The targeted level of accuracy may require a quick cost-benefit analysis. Is your organization willing to invest more hours in calibrating your implementation and internal systems to gain additional benefits (e.g., executive confidence, user adoption, external reporting, etc.)? In some cases, small incremental changes in the error of margin can be a big deal. In other cases, they can result in diminishing returns — delaying or wasting time that could be spent on analysis, testing, or other high-value activities.
In most cases, you’re dealing with an explainable and unexplainable margin of error. You typically want to reduce high amounts of unexplained margin of error. If you have explainable margin of error, you have a couple of options: close or acknowledge the gap. For example, a retailer knows that its real-time SiteCatalyst data is consistently 10% higher in terms of revenue than its backend system. Its backend system removes fraudulent orders and product returns from its final revenue numbers. In this case, the retailer can make the decision to close the gap between SiteCatalyst and its backend system by feeding this post-sale data back into SiteCatalyst. Or the retailer can acknowledge the gap and move forward with optimizing its website and campaigns based on the understanding that its web data doesn’t factor out fraudulent and returned orders.
Jim Novo, Avinash Kaushik, and others have advocated for precision over accuracy. Precision focuses on reproducibility and repeatability compared to accuracy which focuses on obtaining the exact number. As long as your web data consistently falls within an acceptable threshold of accuracy, your business should be able to act on the data’s directional insights with confidence.
When it comes to data validation, you need to focus on two areas. First, were the original business requirements successfully met by the implementation? Hopefully, you have a measurement strategy or business requirement document in place that you can refer to. Your team needs to verify that the desired reports were set up properly and that they’re collecting data. Do you have all of the right buckets in place? Are the buckets capturing anything?
For example, to an untrained, inexperienced, or unfamiliar eye, the collected data in the Pages report might look fine — “we’re collecting a bunch of page data, and it’s nicely formatted. Booyah!” A trained eye will spot the three or four instances of the same home page in the Pages report that is concerning.
Bad interpretation, not bad data
Once you’ve successfully validated your implementation, your job is not done. A few days, weeks, or months after launch, you may run into concerns from different end users that the data seems too low or too high. In many cases, the data is actually sound, but it is just being interpreted incorrectly or simply misunderstood.
For example, if you serialized a “lead completed” event so it was only counted once per visit and used that metric in your new lead conversion rate, it would be much lower than a non-serialized “lead completed” event, which fires every time a visitor lands on a particular confirmation page. The serialized approach may be a better indicator of your true conversion rate, but it may also be different from how you were tracking it previously.
Rather than jumping to the conclusion that the implementation is automatically flawed, seek to understand how the data is collected and what the data really means. Your assumptions about the data may be wrong. If you are introducing changes to the way your data is collected or how KPIs are calculated, communicate and explain those changes to the organization so they understand that the data isn’t bad — it’s just different (and hopefully better).
Good data today … but what about tomorrow?
After going through a thorough data validation effort, what ensures that it remains accurate (precise) and complete in 3 months? 6 months? 18 months? There are many internal and external factors that can spoil good data over time:
- A partner fails to notify you of a significant change they made to your company’s JS file
- A new developer tags web pages without knowing the tagging standards
- A marketing team doesn’t include unique tracking codes in its email campaigns
- An IT team adds several redirects to your website, which now interfere with your campaign tracking
- Another IT team changes your CMS and your page naming goes awry
- A third-party vendor doesn’t understand how to set Omniture’s conversion variables for your site within its hosted online application
From time to time, you may need to spot check your implementation to ensure it’s as accurate and precise as possible. You may even want to schedule regular six-month “dental check-ups” to ensure your site implementation stays clean. If your senior executives are extra sensitive about certain key data points or several moving pieces were required to achieve specific reports, you may need to monitor those reports on a more frequent basis in order to maintain your company’s trust in the data. You can use SiteCatalyst’s Alerts feature to notify you of significant changes in your KPIs related to these key parts of your site. Use alerts like a check engine light for your implementation.
I recently heard of an e-commerce team that stopped using its analytics reports for several months when they questioned the IT team’s ability to accurately tag its web pages. Rather than settling for no data, these concerns should have been confronted head on. Clean and safe web data is the goal so let’s get proper training for the IT team and a data validation process in place. A data-shaken state can’t persist if you want your organization to be data-driven.
Your implementation needs to evolve
The completeness of the data over time is another issue. Most of the companies my consulting team works with are not static and neither are their online properties.
- New websites are being created
- New online strategies are being formulated
- New online marketing campaigns are being launched
- New website features and technologies are being introduced
- New online management teams are being formed
All of these factors can make an existing implementation feel incomplete to a company. Some clients like to blame the tool or implementation when they are not receiving the right data. The implementation is simply guilty of standing still in a fast-paced, constantly changing business environment. A perfectly good implementation can be knocked out of alignment with the business needs of an organization when a website is redesigned, an online strategy shifts, or a new senior executive is introduced. It is like blaming our tailor-made suit for no longer fitting after we’ve gained or lost 30 pounds. We either need a significant alteration to the suit or an entirely new suit. Your implementation needs to evolve with your business. Don’t forget to re-visit your web measurement strategy if any major changes impact your organization or website.
I gotta work it
Again and again
’Til I get it right
Sage advice from a fifteen-year-old. Thank you, Hannah Montana!
In the final post in this series, I will cover the importance of having accountability throughout the organization.