Keep Your Web Data Clean and Safe to Drink — Part I
Continuing my series on the “Seven Keys to Creating a Data-Driven Organization”, I’d like to now focus on the actual web data, which underlies everything. In my last two-part article, I talked about the importance of delivering quick wins to your organization. However, without adequate and reliable data, quick wins are not even possible. It’s important that analysts, management, and everybody else in the company can trust the data.
When I think of data, I often compare it to water. In the real world, water is a necessity of life. Data is a necessity of business optimization. Just like drinking water, you want your web data to be clean and safe for consumption. Companies typically collect large volumes of data. Remarkably, only .007% of all the water on the Earth is accessible for direct use by people. I think of custom reporting as the consumable subset of all your web data. Most people in your company don’t want to drink an ocean of data (they would drown trying), but they would find a pitcher of ice, cold KPIs very refreshing on a regular basis.
Now what if that pitcher of KPIs was dirty or empty from time-to-time? People either stop drinking entirely or look for alternative sources of data. That’s what you want to avoid. Your data-driven initiatives can quickly unravel if your company suspects the web data is bad or incomplete.
Why is data validation important?
Just like unsafe drinking water causes serious issues in developing nations, bad or unreliable data can cause a number of organizational problems. I’ve identified four problems that can be avoided with proper data validation. Each of these problems can veer your company off the path to becoming a data-driven organization.
- Avoid costly mistakes: You want to prevent having incorrect data that may mislead your company’s decision making. Whether it’s a false positive or a false negative, bad data can lead to erroneous conclusions and costly missteps by a business. Analysts can interpret good data incorrectly. However, you’re already behind the eight ball if you’re unknowingly analyzing bad data.
- Avoid missing insights: If a company fails to collect all of the necessary data, it can miss valuable opportunities to improve the business. From a user adoption perspective, the web analytics team can also pass up good opportunities to answer important business questions coming from senior management and the rest of the company. Rather than being the hero, you end up being the zero.
- Avoid fire drills: If the data is shaky or unreliable, web analytics professionals can waste a significant portion of their time responding to fire drills. Rather than spending their time on high-value activities such as user education, analysis, or testing, they are forced to respond to random data discrepancies.
- Avoid flying blind: Internal doubts about data accuracy can quickly erode confidence in a web analytics program. When people within the organization decide it is safer to use no data than perceived bad data, you have a serious problem on your hands. When you’re trying to start a data-driven revolution at your company, the last thing you need is a “data-shaken” counter-revolution to undo all your progress.
Ensure data validation is baked into your process
When you learn that a new site is going live on a compressed timeline, the tendency is to rush the site tagging — and especially the QA process of the implementation. Sometimes the QA process is skipped entirely. Surprisingly, the same people that exerted pressure to get the site live “at all costs” are also frequently the first ones to ask about the site metrics after launch.
Surprise, surprise - garbage in, garbage out.
The goal shouldn’t just be about launching a new site quickly but also about gaining useful insights along the way, which will help to optimize different parts of the business. Was this micro-site worth launching in the first place or should we scrap the next three we have lined up? How can the micro-site be optimized because there appears to be an issue with the registration page? Deploy with good data and you can answer these questions. Deploy without data or dirty/incomplete data, and you will languish in ignorance.
Quality assurance or data validation should be baked into your web development and tagging process. It is less expensive and time consuming to fix an error before deployment rather than after the fact when IT resources have rolled onto other projects. You need to ensure there’s ample time factored in for data validation because bad or useless data is the same as no data. Make sure that you lean on your executive sponsor for support when internal teams are pressuring for exceptions to the tagging standards or QA process.
In the next part of this article, I’ll examine important considerations for your data validation efforts.