Rafe Needleman asks today “Why can’t they fix the Flash/Firefox bug?”, pointing to a lengthy set of Bugzilla comments about intermittent halts of audio/video streaming in some Firefox 2 and 3 installations. The problem is not yet reproducible on demand by others, and so has been difficult to address.
I don’t have the answer, but I do have some context, observations. (Warning: This is long, and the only useful info in it is how to think about problem description. If you’ve got real work to do, then don’t waste time reading my blogpost here….
First and most important, the way to confirm that you have addressed an issue is by being able to make the problem occur on demand, and to tell others how they can also make it happen on demand. That way they can test whether they can stop it from occurring.
A “Steps to Reproduce” is not what you must do to see the problem. A “Steps to Reproduce” is what an engineer needs to do to see the problem. You may not be able to instruct others completely, and knowing what you did is certainly a first step, but a steps-to-repro description should be written from the reader’s point of view, not the writer’s.
And intermittent problems are certainly the most difficult and timeconsuming to address. We need to be able to make the problem happen with assurance, in order to assure that it has been truly removed.
Check out the comments in the thread — “I read in a forum that someone else had the problem too” — that’s not a useful comment. If a capability is not working on your system, then we all believe you, and want to improve things. There’s no need to prove that other people have it. What we need to do is to be able to see it ourselves, so that we can test whether we can make it go away successfully. You don’t need to validate yourself. We all need to explore the argument. It’s not you, it’s it. Relax, we believe you.
There’s one comment from the original poster (identified solely as “M Z”) that “No, I have not managed to reproduce it in safe mode.” This is potentially a killer bit of info. I’m assuming he means “Windows Safe Mode”, an F8-key start which disables many system customizations. If the problem actually *never* occurs with system customizations turned off, then we know to look more closely at the system customizations. But unfortunately, the description is amibiguous… might mean that he rarely tests in OS “safe mode” just as well. Tantalizing, but still less-than-fully-useful.
Someone asks “What Firefox extensions do you have installed?” and then various lists are produced. It’s more useful to know whether the problem still occurs with a stock Firefox installation. If you ever see the symptom on that system when not running any browser customizations, then we’ll have more info than knowing which brands and versions of extensions you’ve customized the browser with. Key refactoring: “Have you ever seen the problem when all Firefox extensions have been turned off?” Even one such incident would exculpate all extensions.
Someone identifying themself only as “email@example.com” offers another potentially useful bit of detail: “I encounter it only after FF has been running for a while (>60 min).” The original-poster needs to be asked whether he has ever seen the problem immediately at system/browser startup, or whether it also needs a significant period of browser use before the problem has ever appeared. If so, I would also ask them to look at their Windows Task Manager, to see how much memory their copy of Firefox is currently using. In the past, media dropout has often been associated with low-memory situations. It wouldn’t be too hard to quantify the current reports, to see whether the well-known Firefox memory consumption issues are in play when they lose audio/video.
Subsequent comments of “I have the same problem” do not help at all. No one doubts the original reporter. But these low-info confirmations just muddy the discussion, making a resolution more difficult to reach.
There are additional contributions such as “the problem is your adobe flash player version”. That’s it, no citation, no reason offered. This is a great example of why public bugbases should be scrubbed for buggy comments. The conversation may be open to this person’s participation, but he is increasing readership costs for everyone else. A group needs a smart mix of inclusion and exclusion in order to function well.
Mike Beltzner, a Mozilla staffer, makes a little progress with this comment: “Can we at least call this Windows only? I haven’t seen any reports of it happening in other OSes.” He’s trying to craft a recipe for reproduction of the problem. If we can be sure it’s Windows-only, then we’d know an engineer shouldn’t bother trying to reproduce it on a Mac. It’s a start.
Let’s switch back to Rafe’s article. It’s got the headline “Why can’t they fix the Flash/Firefox bug?” This sets off warning lights for me, because of his use of the word “the”… implies that there is only a single problem, and that it’s famous. In reality, it’s not even yet well-defined. There are also semantic issues with the rhetorical “why”, as well as the cognitive issue of knowing what the problem actually is before starting to think about whether it is possible to fix it. When I read a loaded phrase like that, I start wondering how well the writer has started thinking about what they will be writing about. Not a big flag, but a small warning flag of possible confusion ahead.
“Both Mozilla and Adobe have been aware of the issue since late May, but as yet no solution has been found.” It would be fairer to see that no way to reproduce it on demand has yet been found. The current stopping-block is in the original descriptions, not in any lack of effort by people writing code.
“One workaround solution is to install the Flash 10 player, which is still in beta.” I have no assurance that this changes the problem… in the Bugzilla talk we haven’t seen that anyone who has had the symptom has been able to make it go away by using Astro, and make it return by going back to Player 9.
Matter of fact, after reading Rafe’s article we don’t know whether he has been able to see the problem himself. I sorta suspect he might have, which would explain his interest in this one not-yet-fully-formed Bugzilla entry, but there’s zip info on his personal work at reproduction of the issue.
(Later: Yes! Down at the very bottom he says that he sees the problem too, but that he doesn’t see it in Internet Explorer. Not much more detail, but it’d be useful if he contributed to the solution.)
There’s a quote from Mike Beltzner that implies “ah, if only all Player code were published, then the problem would be easy to solve.” Baloney. [expletives deleted] Mike should know, from Mozilla’s experience with the Tamarin Project, the massive study curve that even brilliant new engineers need to do to get up to speed, to understand what is going on. And Tamarin is just one small part of the tiny engineering marvel which is Adobe Flash Player.
“He also took a minute to trumpet Mozilla’s open-source philosophy. Since Firefox’s code is open, Adobe can look at it to try to determine what is going on. But Mozilla’s team can’t look into Flash. Beltzner didn’t blame Adobe for the bug itself, but he did say that Adobe’s traditional closed software architecture is slowing down their investigation. ‘We hit a wall when it’s a closed-source solution,’ he said.”
The truth is that you simply need to distill the public complaint into an actionable item. The problem actually lies in Bugzilla’s conversational style. Right now it’s just “Oh I saw someone on a forum describe a similar thing.” You need to show engineers how they can see it. Playing the “proprietary” card instead is just weak. I’m watching my language here, but….
Comments at Webware are interesting. Too bad they close it off by registration (yeah, like I’m going to open new accounts and track new passwords for each special little site), and too bad some commenters hide their identity when commenting on others. (Tip to indy Silverlight evangelists: Including a verifiable identity will reduce the taint of possible astroturfing.) The comments section is not very useful overall, but there’s some realistic thought in there, which I appreciate, thanks.
I’m with Rafe completely on his penultimate paragraph: “Finger pointing is common in software troubleshooting, and I give both Mozilla and Adobe credit for only generally waving, not pointing, their fingers at each other. Unfortunately, neither team seems to have developers who can reproduce this issue, which just keeps the ping-pong game going.” Making the problem occur on demand is the first necessary step in making sure the problem has really gone away.
But his final paragraph seems like rankest fantasy and fairytale to me: “What I find most interesting is the way the differing philosophies of Mozilla and Adobe are slowing down resolution of this issue. If both companies were open then any developer–at Mozilla, Adobe, or elsewhere–could get into things and start experimenting to find a fix. If both companies had closed philosophies then their engineers could swear each other to the secrecy, swap source code, and together fix the issue.”
To solve the problem quickly, focus on what it is.
Summary: From the little I can see in the descriptions, I’d really want to check reporters’ system memory consumption when the problem occurs… not a sure thing, but a quick and easy diagnostic that may zero-in on the cause of the problem. (To put it gently, Firefox is rather famous for its memory issues.)
Bugzilla needs (imho) to tighten down, get rid of the conversational bloat. Doing tech support is an acquired skill, and not everyone can think directly about a problem, but a good bugbase would instruct new contributors on how to help isolate the true problem, how to describe things so that others can usefully attempt to reproduce it. Readers should not have to read through stream-of-consciousness from strangers. Refactor it, make it functional.
And finally, that line “but it’s proprietary” needs to go away. It’s a replacement for branding issues. Even Apple, the most proprietary, closed, secretive company of them all, reflexively reaches for it when they don’t know what else to say. You and I have near-zero chance to influence the W3C or Mozilla to do something — they are not more “open” in process than Adobe, or even Microsoft. “Opensource” code tweakability means more for things which run on your own machines (Linux, Apache) rather than on everybody else’s machines, when “predictability” becomes more valuable. I’m tired of conversations getting derailed when someone resorts to this weak “proprietary” tactic. Think. We need you to think. Just be honest and think. Quit the blaming and think.
And check your system memory if video stops. Not a guarantee, but it’s a start.
Update Oct 13: I’ve closed comments on this. I was amazed that some of the early comments just added anecdotal noise to the basic “you have to break it before you can fix it” principle, but more amazed these kept coming in for months later — seems pretty obvious this post has become a top Google link for “firefox video problem”, and anonymous people just want to talk without listening. They’d do better by reading, and communicating.