Forecasting is not a new concept. All businesses have some level of forecasting ranging from inventory and revenue planning to staffing models. Those same forecasting methodologies are also applicable to web analytics. In this two part blog series, we will walk through how to apply forecasting in Digital Marketing using web analytics data.
This first part will review a scenario that uses a standard time-series forecasting model to recover missing data.
Forecasting Application: Fill in the blanks
Have you ever generated a report only to find that some of the data is missing? Instantly this sparks a series of questions from “Did I correctly pull the report?” to “Who took the code off the page?” During a previous engagement, I was reviewing a forecasting approach to help identify future trends and anomalies (see part 2 of this series coming soon) when the client stopped me and asked “We experienced a loss in data a few weeks ago, could you have predicted that?”
Since there was an element of human behavior involved in the data loss (someone did not tag the page right), it is difficult to predict an expected outcome because we do not have all of the data points relative to human behavior in web analytics. However, we can use a forecasting model to estimate the data that was lost, thus filling in the blanks in the report. Here is how we did it:
- First we identified the data we had – about 1 year prior to the data loss, and about 3 weeks past the data loss.
- Then we used a few time-series models in R (an open source statistical program) and took the 1 year of prior data to forecast about 3 months into the future. In this particular case, the 1 year into the future included 3 known weeks, and 1 unknown week (the data loss week).
- With the prediction, we checked some statistics to minimize error and get a good model fit, then used the 3 weeks of known data past the 1 week of data loss to validate the prediction.
- Since we were predicting what we already knew, the validation was quite robust. Once all parties were satisfied with the validation, the model was used to provide the data for the missing week.
For someone familiar with time-series modeling, this seems easy. For those that do not want to dig back to the stats or economics class they once took, there is still a way to do it. Instead of the statistical modeling, try using the regression feature in Excel. Or more simply, just add a trend line to a graph created in an Excel worksheet. Although regression assumes a linear relationship, you can still use the model created (trend line) to make estimates. Just remember the context of your data is important. Be aware of outside variables that could be influencing your data. Additionally, most time-series models tend to work best with about 18 months of data (try starting with an ARIMA or Holt-Winters time-series). At a minimum, make sure you have enough data to accurately represent your sample. Consider previous trends, sales cycles, or product release patterns.
Once the model is determined, consider applying it to dashboards for a perpetual forward look and to identify potential anomalies. Time-series forecasting models also provide a confidence band with an upper and lower bound – which will be discussed in the next blog post of this series.