Fore­cast­ing is not a new con­cept. All busi­nesses have some level of fore­cast­ing rang­ing from inven­tory and rev­enue plan­ning to staffing mod­els. Those same fore­cast­ing method­olo­gies are also applic­a­ble to web ana­lyt­ics. In this two part blog series, we will walk through how to apply fore­cast­ing in Dig­i­tal Mar­ket­ing using web ana­lyt­ics data.

This first part will review a sce­nario that uses a stan­dard time-series fore­cast­ing model to recover miss­ing data.

Fore­cast­ing Appli­ca­tion: Fill in the blanks

Have you ever gen­er­ated a report only to find that some of the data is miss­ing? Instantly this sparks a series of ques­tions from “Did I cor­rectly pull the report?” to “Who took the code off the page?” Dur­ing a pre­vi­ous engage­ment, I was review­ing a fore­cast­ing approach to help iden­tify future trends and anom­alies (see part 2 of this series com­ing soon) when the client stopped me and asked “We expe­ri­enced a loss in data a few weeks ago, could you have pre­dicted that?”

Since there was an ele­ment of human behav­ior involved in the data loss (some­one did not tag the page right), it is dif­fi­cult to pre­dict an expected out­come because we do not have all of the data points rel­a­tive to human behav­ior in web ana­lyt­ics. How­ever, we can use a fore­cast­ing model to esti­mate the data that was lost, thus fill­ing in the blanks in the report. Here is how we did it:

  1. First we iden­ti­fied the data we had – about 1 year prior to the data loss, and about 3 weeks past the data loss.
  2. Then we used a few time-series mod­els in R (an open source sta­tis­ti­cal pro­gram) and took the 1 year of prior data to fore­cast about 3 months into the future. In this par­tic­u­lar case, the 1 year into the future included 3 known weeks, and 1 unknown week (the data loss week).
  3. With the pre­dic­tion, we checked some sta­tis­tics to min­i­mize error and get a good model fit, then used the 3 weeks of known data past the 1 week of data loss to val­i­date the prediction.
  4. Since we were pre­dict­ing what we already knew, the val­i­da­tion was quite robust. Once all par­ties were sat­is­fied with the val­i­da­tion, the model was used to pro­vide the data for the miss­ing week.

Forecasting Graph

For some­one famil­iar with time-series mod­el­ing, this seems easy. For those that do not want to dig back to the stats or eco­nom­ics class they once took, there is still a way to do it. Instead of the sta­tis­ti­cal mod­el­ing, try using the regres­sion fea­ture in Excel. Or more sim­ply, just add a trend line to a graph cre­ated in an Excel work­sheet. Although regres­sion assumes a lin­ear rela­tion­ship, you can still use the model cre­ated (trend line) to make esti­mates. Just remem­ber the con­text of your data is impor­tant. Be aware of out­side vari­ables that could be influ­enc­ing your data. Addi­tion­ally, most time-series mod­els tend to work best with about 18 months of data (try start­ing with an ARIMA or Holt-Winters time-series). At a min­i­mum, make sure you have enough data to accu­rately rep­re­sent your sam­ple. Con­sider pre­vi­ous trends, sales cycles, or prod­uct release patterns.

Once the model is deter­mined, con­sider apply­ing it to dash­boards for a per­pet­ual for­ward look and to iden­tify poten­tial anom­alies. Time-series fore­cast­ing mod­els also pro­vide a con­fi­dence band with an upper and lower bound – which will be dis­cussed in the next blog post of this series.