Opera’s MAMA is a “Metadata Analysis and Mining Application” from staffer Brian Wilson. It’s a project to explore the nature of HTML tags on today’s web, via a sample of 3.5 million URLs. There were some early reports in October, and recently they’ve released additional data on tags used for cross-browser plug-ins.
A few surprising things about plug-in tagging jumped out at me:
- The study found more EMBED tags than OBJECT tags. The difference was small — only 2% — but it indicates there are still some very old sites out there which specify media only in Netscape format.
- The APPLET tag, used by Java, is found on just over 50,000 of those pages… about 1.5% of the sample. (SWF is found on 34% of the sample set.) These numbers feel right to me, but it’s the first time in recent memory that someone has attempted to rigorously test the evidence, as Opera has done here.
- This MAMA study is the same one which ran the world’s webpages through the W3C Validators and found that “145,009 out of 3,509,180 URLs passed validation -— only 4.13%!” (My takeaway from that is that there’s a significant difference between idealism and realism… telling people what they “should” do isn’t usually as straightforward as watching what they themselves choose to do.)
Brian Wilson has tons more MAMA data on the Opera site… really valuable in understanding the tagging structure of the real web today, and I appreciate that they made this info public!