Tra­di­tion­ally speak­ing and in many areas of mod­ern sci­ence, one of the biggest obsta­cles in any exper­i­ment or analy­sis is obtain­ing a large enough data set to ful­fill the stan­dards of sam­ple size required by most sta­tis­ti­cal pro­ce­dures. In more recent years and espe­cially in the world of dig­i­tal mar­ket­ing, this is far from the case. Many clients are inun­dated with data – so much, in fact, that it can some­times be dif­fi­cult to know what to look for or where to start. With the com­pu­ta­tional costs asso­ci­ated with truly large data sets (gigs upon gigs and ter­abytes upon ter­abytes) we want to be strate­gic in the way we exam­ine the data.

If you’re not entirely sure what to look for, a great place to begin is with an Asso­ci­a­tion & Affin­ity analy­sis. An Asso­ci­a­tion & Affin­ity analy­sis is an extremely flex­i­ble type of analy­sis that allows us to use basic data min­ing tech­niques to estab­lish rela­tion­ships between site met­rics. Tra­di­tional mar­ket­ing pros might know this type of analy­sis bet­ter as “mar­ket bas­ket” analy­sis. This term comes from the sim­ple exam­ple of buy­ing items at a gro­cery store. With trans­ac­tion level data, we can deter­mine which items are most likely to be pur­chase together. For exam­ple, if we know that 71% of trans­ac­tions that include milk also include bread, then we can start to make some tac­ti­cal deci­sions about when we put those two items on sale, where we phys­i­cally place the items in the store, etc.

Association Analysis finds underlying structures and relationships in large data sets.

Asso­ci­a­tion Analy­sis finds under­ly­ing struc­tures and rela­tion­ships in large data sets.

A cus­tomer con­tacted Adobe Con­sult­ing to ask for help in deter­min­ing whether or not a customer’s ori­gin could help pre­dict the loca­tion to which they’re trav­el­ing.  Addi­tion­ally, they were inter­ested in know­ing if there were any sea­sonal or brand fac­tors to take into con­sid­er­a­tion. What we found was strik­ing, sur­pris­ing, and significant.

With the help of the Insight Con­sult­ing team, we looked at sev­eral years of trans­ac­tion level data, which amounted to a lit­tle over 10 mil­lion trans­ac­tions. Need­less to say, this is a lit­tle more than Excel is able to han­dle, and a per­fect oppor­tu­nity to use some sim­ple, but very use­ful data min­ing tech­niques. One of the great qual­i­ties of Insight is its abil­ity to han­dle large data sets, so we used Insight to do all the num­ber crunch­ing. As men­tioned pre­vi­ously, there were sev­eral vari­ables that we took into con­sid­er­a­tion for our analysis:

  • Ori­gin – the place a cus­tomer was phys­i­cally located when they made the rental reservation
  • Des­ti­na­tion – the place a cus­tomer picked up the car for the rental reservation
  • Brand – this par­tic­u­lar travel agency was a con­glom­er­ate of sev­eral brands, which func­tioned sep­a­rately, but were still part of the same company
  • Date – the day, month, and year for the time of travel

We incor­po­rated three dis­tinct sta­tis­ti­cal mea­sures of asso­ci­a­tion:  sup­port, con­fi­dence, and lift ratio. With these vari­ables, we cre­ated a dash­board that allowed ana­lysts on the client side to inter­act with and under­stand the analy­sis. The end result is set of tables and heat maps that iden­tify the origin-destination pairs that are most likely to occur together, as seen below. 

Example of a custom dashboard put together for a client.  The heat map is a visual tool for determining the strongest origin-destination associations.

Exam­ple of a cus­tom dash­board put together for a client. The heat map is a visual tool for deter­min­ing the strongest origin-destination associations.

Through the use of the dash­board, we iden­ti­fied some insights that were very sig­nif­i­cant to the client. For example:

  • We found that the vast major­ity of trans­ac­tions occurred within state, mean­ing that most cus­tomers were stay­ing local for their travel needs.  This came as a big shock.
  • The dif­fer­ent brands had very dis­tinct behav­iors.  Some brands tended to attract cus­tomers that were more likely to travel out­side the state, while other brands were used more fre­quently by the cus­tomers look­ing to travel within their own state.
  • Sea­son­al­ity had a sig­nif­i­cant effect on the choices peo­ple were mak­ing regard­ing their travel loca­tions.  Over­all, spring and sum­mer months tended to see a more diverse range in the des­ti­na­tions of choice, while fall and win­ter months saw trends in cus­tomers stay­ing closer to home.

Of course, there are lit­er­ally thou­sands of indi­vid­ual origin-destination pairs when we con­sider the dif­fer­ent gran­u­lar­i­ties, sea­sons, and brands. Deliv­er­ing the analy­sis via a dash­board ensured that the client would be able to dive deeper into any and all pairs at the gran­u­lar­ity that fit the needs of the spe­cific busi­ness units that would be con­sum­ing the analysis.

With web ana­lyt­ics data, there is almost no limit to the areas where an Asso­ci­a­tion & Affin­ity analy­sis may apply. Con­tact an Adobe rep­re­sen­ta­tive to explore the details of this type of analy­sis for your organization.