Opera’s MAMA on plugins

Opera’s MAMA is a “Metadata Analysis and Mining Application” from staffer Brian Wilson. It’s a project to explore the nature of HTML tags on today’s web, via a sample of 3.5 million URLs. There were some early reports in October, and recently they’ve released additional data on tags used for cross-browser plug-ins.

A few surprising things about plug-in tagging jumped out at me:

  • The study found more EMBED tags than OBJECT tags. The difference was small — only 2% — but it indicates there are still some very old sites out there which specify media only in Netscape format.
  • Their sample of 3.5 million pages had about a half-million OBJECT/EMBED pairs, but they found just over a half-million additional sites using external JavaScript to add these tags. This sounds a little strange to me… on the sites I visit I see much more external-JS use than in-page tagging, partly as a response to Microsoft’s pre-Silverlight handling of the EOLAS patent, partly as a by-product of advertising network requirements. Anyway, these results seem to imply that 50% of SWF use is via external scripting, which feels low to me.
  • The APPLET tag, used by Java, is found on just over 50,000 of those pages… about 1.5% of the sample. (SWF is found on 34% of the sample set.) These numbers feel right to me, but it’s the first time in recent memory that someone has attempted to rigorously test the evidence, as Opera has done here.
  • This MAMA study is the same one which ran the world’s webpages through the W3C Validators and found that “145,009 out of 3,509,180 URLs passed validation -— only 4.13%!” (My takeaway from that is that there’s a significant difference between idealism and realism… telling people what they “should” do isn’t usually as straightforward as watching what they themselves choose to do.)

Some history on EMBED vs OBJECT: Netscape 2.0 introduced browser extensions in mid-1995… a half-year before JavaScript, by the way. These were embedded within the page via the EMBED tag. The next year Microsoft announced similar capability within Internet Explorer 3.0, by ActiveX via the OBJECT tag, although early versions of IE could also invoke Netscape Plugins via EMBED. In the later 1990s HTML 4.0 went with the Microsoft tag and forbade the Netscape tag, but offered no hints as to how the real world might reach such a purified state. Realistically, almost everyone followed the early Shockwave approach and nested OBJECT & EMBED to make the various browsers happy.

Brian Wilson has tons more MAMA data on the Opera site… really valuable in understanding the tagging structure of the real web today, and I appreciate that they made this info public!