The Sad State of Hardware

I (vaguely) remember when I was a young, naive programmer, and I assumed that compilers could essentially do no wrong. Gosh, if something was broken or didn’t work right, it _must_ be my fault. Of course, that assumption didn’t last long. I remember when I first realized that my program was compiling differently based on if and where comments were – must have been some bug in the compiler parser, who knows? Once that line was crossed, though, I began to get a sense of what problems might be mine, what actually might be the compiler, when I should pull up the generated code to look for compiler faults – and figuring out how to work around them. And I’d get a sense of which compilers were the worst offenders. Heck, there was one compiler which seemed to have the development philosophy of shoving in every optimization they could think of with a major version release, then back them out in patches as they got code generation bug reports.

Hardware was different. Because hardware could always break, you’d always be on the lookout for it to go funny. From loose cables, to unseated cards in slots, to overheating problems. You’d expect something to eventually go bad because it lived in the physical world and eventually everything dies. But you didn’t expect design flaws – at least not flakey ones that affected specific software packages. If something was busted, it was pretty consistently busted. And there wasn’t nearly the variety of hardware that you could plug together – not to mention that tolerances of dodgy parts was looser.

Things have changed, and not for the better. It was always the case that chips might have errata – but they’d usually be covered up in the microcode or in the BIOS before things shipped. Devices would have issues, but the driver would be tweaked before the device was shipped to compensate. In general, you could buy a box, and have reasonable confidence it was put together reasonably and would do what you wanted. Sure, there were the manufacturers who had the (very) annoying habit of putting in cheesed-up and dumbed down OEM versions of certain boards, and it would be hard to get driver updates for them. You just learned to avoid them.

Then there was that first interesting case of a design flaw on the motherboard. An undersized capacitor specification on a reference motherboard design would allow the system memory bus to go undervolt if the bus was saturated for more than a certain length of time. Guess which program was best at managing to do that? I think for a lot of the users affected by that problem it was the first time they had encountered real hardware issues – and much like my early experience with compilers, they were hard pressed to believe that it was a hardware issue. Heck, isn’t it always the fault of the software? There had always been bad RAM issues that Photoshop seemed to be the only piece of software that could tickle it, but this was the first really widespread hardware issue. And worse, the only fix was a physical motherboard swap.

Now, of course, it’s even worse. With the internet being an accepted method of delivering drivers and BIOS updates and whatnot, I think most PC manufacturers have gotten lazy. Things no longer necessarily go out the door in a working state. Heck, the Media Center PC I have at home couldn’t burn DVDs for the first 10 months I owned it, until a BIOS update and a driver firmware update (and a re-image).

Don’t get me wrong – I think the internet updates are great for letting slightly older hardware adapt to new things without having to go replacing it all. Much better to just flash your WiFi card to add better security than it is to replace it.

But I think we’ve hit a point where things are just abusive for the average consumer. We now see RAM issues so frequently that it’s part of the FAQ list. Video drivers seem to be perpetually broken in some way or another. I simply don’t feel comfortable recommending machines from any name brand PC makers anymore because they all seem to have serious weak points in their product lines. Now, one could argue that it’s the price preassures that have been at least partly to blame for this – if people didn’t try and save every last dollar from their purchases, that the manufacturers could use more reliable and better tested parts.

I just nailed down an issue today having to do with Hyperthreading and memory allocation. Yes, a BIOS update solved it (as did turning off hyperthreading). But should the average Photoshop user really have to know how to figure out the motherboard manufacturer, find the BIOS update and apply it? Or turn off Hyperthreading? And why wasn’t the BIOS update pushed out over Windows Update for such a serious issue? To the user, it just looked like Photoshop was freezing up in the middle of painting or other operations.

And, of course, it hits Photoshop more than other applications. It’s the nature of the beast – the data sets we’re dealing with are very large for an end-user application, and we move it around a lot faster than other applications. It’ll expose marginal hardware more often than the best system diagnostic software.

Some would argue that we should make Photoshop tolerant of bad hardware. On this I have to disagree. We’re talking about penalizing all users because some people bought dicey systems or cheap RAM. For those who get bad hardware, it sucks – but the right place to take that up with is the hardware manufacturer (yes, I know such things generally fall on deaf ears). But until they get some kickback, they’re going to continue to put out flakey stuff, crammed with shovelware, that’ll manage to run your MP3 player or browser, but gives up the ghost when trying to capture video or touch up your pictures. Or, let’s push the PC magazines to put together some real stress tests, and rate the hardware vendors on long-term stability – knowing what machine is fastest at Quake 4 is useless if it reboots after 9 hours of heavy use because of a thermal issue. I’d say start buying Macs, but things don’t seem to be too much better there, either. I think the hardware is (generally) in better shape, but I think the OS could use a bit more bug fixing.

Until then, just realize that marginal hardware can affect software, especially those programs which try and get the best performance out of your machine. Users shouldn’t have to become hardware diagnosticians just to remove red eye from their kids’ pictures, but that’s where we’re at.

Sucks but true.