The De Facto Web

Opera Software makes a web browser. They’ve accumulated a giant database of web URLs for testing purposes. Recently they analyzed the contents of those pages. The results are very surprising:

  • 33.5% of sites use Flash
  • 78% use JavaScript, but only 4% use Ajax
  • 8% of sites have little “W3C Validated” badges, but only 4% actually pass the W3C’s validators

There’s been a lot of lecturing done, over the past decade, about “web standards” and “the open web” and “the proprietary unweb” and so on. What’s interesting is how much that rhetoric diverges from reality: Eight times as much Flash as Ajax, or even “valid” HTML. Lots of JavaScript, but little of it advanced. A false sense of how many sites actually follow the W3C’s lengthy specifications.

What we’ve been told to think, is different than how things are.

Web standards folks seek to dole blame. In over-long, inaccessible, and even polarizing English.

HTML5 promises to make things more complex and unimplementable, instead of focusing on the basics like clicking. The discussion is lengthy and not very readable, even for fast-reading native English speakers, and is presided over by an uncredited Google staffer with an arbitrary manner.

The “open web” just acting… strange.

Early web software acted upon Postel’s Law: “Be conservative in what you send; be liberal in what you receive.” Web discussions the past few years have had a streak of intolerance, of not accepting what is seen as “impure”. But the Opera study doesn’t show such authoritarianism in The Real Web.

What we’ve been told to think, is different than how things are. There’s a difference between the volume of speech, and what the volume of people actually believe. That’s what the Opera survey seems to indicate.

Flash is a real part of the World Wide Web today. A major part. Bigger than “web standards”, bigger than Ajax. It doesn’t replace the web. It’s part of the real web, as real people use it.

But here’s the interesting thing. The conservative standardista/openWeb/inGroup position may be anti-Flash, but Adobe is more than Flash, and not anti-standardista.

Adobe’s about publishing — the ability for creative people to communicate. We love publishing. HTML and the W3C, as flawed as they are, are Adobe’s allies. We’re continuing to work on JavaScript. Improving the world to the standardista’s goal, is also Adobe’s goal.

Adobe makes Dreamweaver, not just Flash. InDesign and AfterEffects too. We enable publishing. The more that people can communicate, the better Adobe tends to do.

And we’re not embedding an advertising/surveillance network in your content. No intermediation between you and your audience. It’s free, unencumbered, open publishing… that’s the goal here.

Getting a predictable renderer atop the world’s desktops takes the heat off HTML. It lets HTML be HTML. But if HTML tries to be SWF, it won’t do as well.

My recommendation? Just keep things in a sensible proportion. We need to improve HTML. We also need to improve SWF. But both are real publishing options today. We need to acknowledge both, and not give in to prejudice. Stay open.

That’s my main takeaway from the Opera report. What we’ve been told to think, is different than how things are.

(Sidenote on the Dreamweaver stats: Very few pages produced with Dreamweaver are actually identifiable as such. The Opera study says “MAMA looked at the META ‘Generator’ value to find popular CMS and editor”. But I can’t recall Dreamweaver ever identifying itself in the META-Generator field. Even if they checked for JavaScript routines like mm_swapImage, not all Dreamweaver pages use it, and not all pages which use it came from Dreamweaver. Adobe just provides neutral publishing technology, and it’s up to each creator how they choose to use it. The survey’s material on editors there, I’m not sure what it might really mean.)

More discussion on the still-unfolding Opera study is at Ars Technica and Slashdot today. Jens Brynildsen has a Flash-oriented perspective.

[ Afterword: A few hours later I re-read this, and realized I should provide some authoring context. That’s not “Adobe” talking, that’s me talking. I wrote it pretty much at a gulp this afternoon at the office. I sit near Scott Fegette, on the Dreamweaver team, and asked him to give it a quick read for any reasons to kill it, but that’s the limit of “the corporate voice” on the stuff above. (I didn’t ask Scott if he agreed. 😉 I had caught the news via the Ars Technica article last night, made a quick Twitter to stake my claim on the punchline, but then sat down and really read the initial Opera material, and was impressed. Took the day to digest it. Other people, both within Adobe and within the larger ecology, definitely shape my opinions. But many of them might disagree with parts of what I wrote. The words in the essay above are mine, and not Adobe’s.]

13 Responses to The De Facto Web

  1. Ian Hickson says:

    If you have input on HTML5, please don’t hesitate to send it to the list, I’ll make sure to take it into account. Your input would in particular be very welcome on the clickjacking issue; if there’s a solution that doesn’t involve making things more complicated I am definitely eager to adopt it.
    Hopefully your feedback can make my manner less arbitrary. 🙂

  2. Jeff Muir says:

    John,
    You have revealed that the imagined does not match the real when it comes to the web. I agree with your thinking about paying more attention to what exists instead of lecturing about what is proper.
    I couldn’t help but think about the garbology project at the University of Arizona (http://en.wikipedia.org/wiki/Garbology). It’s a bit of a stretch for a link but the connection is there.
    What was found is that people often would lie or underestimate about what was in their garbage. For example, it was determined that many more beer containers were found than admitted to. People in general have an image of just about anything but tend to shift thinking when it suits them. Real measurement is the only way to know for sure, which makes your statistics really interesting.
    Thanks,
    Jeff

  3. John Dowdell says:

    Hi Ian, thanks for stopping by, hope I didn’t harsh on you too bad. 😉
    I know the process, and I’ve watched it this cycle too. I don’t think I, individually, have much leverage to simplify browser markup and make it accessible to more people. There’s too much pressure against it. Other people at Adobe are working at the various standards processes, in good faith, but me, I’ve seen the results, and am skeptical. My view is from someone on the outside, watching. (I don’t feel qualified to suggest how browser vendors should best handle clickjacking, but thanks.) (And I think it would be good to clearly disclose your corporate affiliation in your sig… protects you from readers not knowing the context.)
    Jeff, I think I’m guilty…. 😉
    jd/adobe

  4. Kim C. says:

    Always interesting to get your personal perspective jd. I’m not sure how many people appreciate how many years you’ve spent participating in the study of how people use technology to communicate and publish. I haven’t always agreed with you, but your perspective is tempered with experience.
    So, my newest mantra at my enterprise location is simply this: The use of technology is like water. Users naturally follow the path of least resistance and greatest reward. You can guide and direct, but as we’ve seen with each emerging technology–be it social networking or video on demand or ubiquitous connections with multiple devices–users *always* find a way to have their needs met. We of the tech-driven and tech-obsessed forget far too quickly that our users care little about anything other than the path to the most engaging and least difficult experience. It that’s Flash, fine. It it’s AJAX, fine.
    But I have yet to meet an average consumer of networked information that cares one whit about the standards and protocols that are in place behind the curtain. They gravitate to technology that meets their needs and has the least entry-level cost.
    Example? The early Google. One box on the page where you enter your search term and reliable results that more often than not gave you results you were after.
    Kim

  5. John, the data is interesting and your conclusion too. The ars story you cited says the database is of 3.5 million pages. On one h and this seems large, but on the other h and it seems like a small fraction of the web. Is there anything you know about the study which would tend to alter your conclusions?
    another thought — is there another indication of Ajax outside of httprequest? Cheers!

  6. Ian Hickson says:

    jd: The HTML5 effort isn’t like other standardisation processes; if you have ideas that are practical and workable, then it doesn’t actually matter if people are opposed to them (unless they too have good technical arguments). You don’t need leverage in the WHATWG. You only need good technical arguments. So please don’t give up before trying — send us your feedback! 🙂

  7. John Dowdell says:

    Howdy…. 😉
    Up above, Kim had some thoughts on how people naturally focus on what they want to get done. I agree… each person has their own mix of short-term and long-term priorities, and each makes their own decision.
    George: I read the methodology page once, and it seemed reasonable, but I only skimmed the URL set documentation. I haven’t come across any quibbles from other readers yet.
    Ian, I prefer “small pieces, loosely joined”, to “big spaghetti, tightly clumped”, and so am not personally interested in debating marinara vs pesto. HTML would be better if simpler. And ISO/OOXML opened my eyes too.

  8. Brian Wilson says:

    Hi John,
    Thanks for your thoughts on this issue.
    A little bit more clarity on the META generator thing –
    As popular as I know Dreamweaver is/was, the fact that it was only 10th in the list of such values (below Claris Home Page!) shows that whatever caused its trail was rare in the product…perhaps only in a single version or two. The actual values found indicated enough variance to suggest that the addition of the META was not so much of an isolated event.
    My attempt to tie validation and META Generator was simply to try and identify factors which could be influencing the validation outcome. With such low overall validation numbers, I’m not sure I succeeded, but I think it certainly generates discussion. =)
    BTW, MAMA also tracks the Macromedia functions in script, and those ARE very populous. I did not use that as the definitive sign of dreamweaver usage because I was not sure which of Adobe’s products produced that code.
    [jd sez: Hi Brian, thanks for the word. I’ve been baffled how to measure Dreamweaver usage too — its pages look like whatever the author chooses them to look like. I didn’t know so many people had META GENERATOR info! 😉 Hope I didn’t put words in your mouth, but those were my takeaways when I read the first reports in your study. Looking forward to more, thanks!]

  9. Hi John,
    The technology improves because people who are dealing with the professional part of it are really involved. You spent time writing a blog and giving your opinion, which definitely that you have an opinion. Participating to W3C forum is exactly the same. Most of the tecnologies developed at W3C have *public fora*. The W3C specifications are open, royalty-free and last but not least published at a regular pace for review by the public and the industry as large.
    Please do send your comments, because it seems you have opinions on what should be done. For example, you say, HTML should be simpler, what do you mean by that? markup language? the APIs? the parsing algorithm, etc/

  10. Robert O'Callahan says:

    The results on Ajax are hard to interpret. Most big Web apps are behind login prompts, which probably means they weren’t in this index.
    And how many of those Flash objects are animated ads? [jd sez: Five?]

  11. Brian Wilson says:

    Re: Robert, Ajax results
    [More data in the section on scripting should help some with the XMLHttpRequest data…that will be up in a few weeks.]
    Yeah, I agree that there needs to be some latitude on this point. MAMA’s current URL set is very top-page-of-site heavy (a drawback of DMoz), and Web apps that would use XHR tend to be Deep URLs…or behind passwords.
    I also found that a rather large number of scripting scenarios are not (yet) parsable by MAMA – external script files dynamically referenced by scripts on the fly. There are a number of issues here, including the fact that due to scripting logic, the mere detection of an object doesn’t tell you much of anything about how it is being used.
    Re: flash as ads
    I’d suspect a pretty high percentage, actually…judging by my personal experience. One thing I’m learning with MAMA is that The Web May Not Be What You Think It Is. Some of my own biases are being exposed, or at least questioned, as are those of some others. MAMA’s current DMoz URL set basis is not the be-all, end-all of the Web, but it is definitely a significant and publicly repeatable set (which was the point of this initial study).
    Until we can come up with one or more markers that will definitively point out flash usage as an ad, we can only say that “flash is being used in some manner”

  12. Scott says:

    But what percentage of that 33.5% of sites use Flash only for (third party) advertising?
    [jd sez: Dunno. Don’t know if that was even measured. Difference still profound, though.]

  13. Brian Wilson says:

    HI.
    Thanks for your thoughts on this issue.
    The HTML5 effort isn’t like other standardisation processes; if you have ideas that are practical and workable, then it doesn’t actually matter if people are opposed to them (unless they too have good technical arguments).