February 11, 2009

What to release?

Out of the growing list of tests written: what do I pick to release?

Mostly, I pick tests that show the biggest problems. That could be huge problems on one compiler, or significant problems on many compilers. Some tests are intended to provide the developer with information instead of showing problems – those need to be released, but aren’t as urgent. I also have some tests that don’t show problems on any compilers tested – they’re the lowest priority, but will eventually be released as well.

Why only release a few new tests at a time? Because it takes time to clean up the code, time to review the code, time to verify the results, etc. Because people downloading the benchmarks are more likely to read 3 or 4 new files at a time than 50. Also, because compiler vendors are more likely to dig into the details of a few new bugs at a time than when they’re hit with hundreds of new bugs to track down. At least, that’s the theory.

I test the benchmark code (over 90 files right now) on as many compilers and platforms as I have available. Then I create a quick summary of each test’s results across those compilers (sort of a “good, bad or ugly” ranking). From that I select a few candidates for further investigation. I need to verify that the files are testing what I think they are testing, and that the problems shown are real. Now I may need to fix problems, or just clean up the source code. Sometimes I need to profile the code and figure out why unexpected results are occurring. Next comes peer review of the code, and more cleanup. Sometimes reviewers point out redundancies (stuff that could easily be removed), or new things that need to be added to complete the picture of a given test area. And that can lead to more investigation and more review.

Eventually the code settles down and is ready to go out the door. Then I zip it up, post it on the website, and create a blog entry announcing the new release. Then I just hope that someone else is going to read it.

January 22, 2009

What to test?

I have a running list of ideas available at http://stlab.adobe.com/wiki/index.php/Performance/WhatToTest . That list doesn’t include a lot of detail – I’m just trying to document what needs to be done at a high level.

How did I come up with the current list?

First I took a look at existing benchmark code that was meant to be used internally at Adobe. Some of the tests could be abstracted (removing all the Adobe specific bits), and some could not. Some were written for very specific compiler bugs and needed to be expanded. And some needed to be split into multiple, more specific benchmarks before they would be useful.

Many of the optimized versions of our code showed patterns in the techniques used to make them faster. If we had to repeatedly do an optimization ourselves, then that optimization is an opportunity to improve the compiler – and that needs a benchmark written for it.

I also went through my compiler, performance, and language reference books and thought about areas that could cause performance issues. That added a lot of ideas, but many of them are difficult to test accurately. Also, some of them are such well known and well solved problems that they may never cause a performance issue. But I don’t trust compilers – I prefer to verify that they are doing the right thing.

But the most useful way I come up with items to test is by example.

An application recently identified a slowdown when parsing a certain file type, but only on one platform. A quick profile showed that the function isdigit() was about 4 times slower on that platform than on others (relative to other functions involved), and being called frequently by the file parser. So I wrote a quick benchmark to test isdigit(), and found that the compiler in question had a very inefficient implementation of isdigit().

We could have stopped after reporting the bug to the compiler vendor. But shouldn’t other developers know about this? What if related functions are slow and causing problems for other applications? What if the performance regressed on other compilers/platforms or in a later release of this compiler?

So, I expanded the quick benchmark into something more generalized, and added the rest of the common functions from ctype.h. Then I added baseline versions of a few routines for verification and comparison. That’s how I found that isspace() was another order of magnitude slower than isdigit() under that compiler, and that both are slower than the obvious lookup table approach to implementing the ctype functions.

I wouldn’t have thought to look at the ctype functions for performance problems. They’re so old and well used that I didn’t immediately think they could cause application level slowdowns. Hmm, what other common C library functions might not be performing well? What other assumptions are we making about our compilers and support libraries that could be hiding important performance problems? What else should we be testing?

December 09, 2008

Second release

The second release of my benchmark is now available from http://stlab.adobe.com/performance/ .

Sorry for the delay. I meant to post new files every month or two, but Photoshop CS4 kept me busy for a while.

I’ve updated a few things in the existing files, and added 3 new files.

Function Objects
This is a benchmark for instantiation of simple functors, and partly a demonstration of the relative performance of function pointers, functors and inline operators. When a compiler works well, functors and inline operators should perform identically. Of course, there is some room for improvement.

Simple Types Constant Folding
Most developers assume that their compiler will do a good job of folding constant math expressions on simple data types. But do developers verify that assumption? One compiler does a decent job of folding the constants, but sometimes issues empty loops after removing constant calculations from the loops. Other compilers simplify some calculations but not other, similar calculations.

Stepanov Vector
What happens to performance when I replace a pointer with a vector iterator? And what happens if I use reverse iterators? This is a test of the compiler and of the STL implementation shipped with the compiler. It’s really sad to see good compilers brought to their knees by bad STL implementations.

July 20, 2008

Analysis of abstraction penalty and MSVC

I’ve posted another analysis – this time for the stepanov abstraction penalty test compiled with MSVC 2008 and 2005. Here we compare two releases of a single brand of compiler to look at the code generation improvements and regressions.

Example Analysis

June 29, 2008

Another Analysis Example

I’ve written up another analysis for loop invariant code motion and LLVM. The good news is that the LLVM team has already fixed some of the problems identified by my code. The bad news is that I discovered that I used the wrong code for one test (loop invariant integer division) and need to add something more suitable in a future release.

Example Analysis

May 29, 2008

Example Analysis

A few people have asked how to analyze the numbers that the benchmark code generates. Unfortunately, I don’t have any automated analysis yet. So I wrote up an example analysis of one test.
I’ll post more examples as I get time.

Example Analysis

May 13, 2008

Release now available

The initial release of my benchmark is available now from http://stlab.adobe.com/performance/.

May 04, 2008

Preparing the initial release

So far, I have written about 50 test files. I’m trying to explore the axes of my test space: what areas need to be tested, how deeply and specifically do they need to be tested, etc. The tests I’ve written cover C++ language concepts, simple idioms, common use idioms, runtime support, specification conformance, and compiler optimizations. Most of the tests have at least one major compiler performing badly. Of course, that is probably because I’m taking the tests from lists of things that I know compilers don’t do so well. On a positive note: all of the compilers I have tested are doing very well on dead code elimination.

I selected 3 of those tests to go out in the initial release — because people are more likely to read, understand, and discuss 20 pages of code than 2000. This will also give me a chance to get some verification on the approach and style, then clean up the remaining test files before sending those out.

Stepanov Abstraction
An expanded version of the original test, answering “what happens to performance when I wrap a value in curly braces”? Almost all compilers do well on the original summation tests, but they don’t do nearly so well on simple sort routines using the same abstractions.

Loop Invariant Code Motion
A test to see if the compiler will move loop invariant calculations out of the loop. This is something that a lot of developers assume that the compiler does for them. Unfortunately, the compilers I tested have a lot of room for improvement in this area.

Loop Unrolling
This is almost a straightforward test to see if compilers will correctly unroll loops to hide instruction latency. “Almost” because if I hand unrolled the loops it would be several hundred pages of source (I did it, it’s big). So, I used templates to do the unrolling — and found that some compilers have problems with such templates (which is yet another performance bug). Every compiler I’ve tested has a long way to go on correctly unrolling loops.

Machine Info
Plus one utility program to print out information about the compiler version, OS, and machine environment – because it’s nice to know which build of your compiler generated a particular report, and which of the 30 machines in your lab that it was run on.

A few weeks ago, I sent the initial release out to select compiler vendors for review. I received responses from 3 of the major compiler vendors, and 2 compiler teams have already found and fixed a couple of bugs based on my code (Yea!).

My next step is to setup a web site for downloads and send the initial release out to a larger audience of compiler writers.

March 16, 2008

Some Background

I joined Adobe to work on Photoshop in 1996. Since that time, I’ve been working on Photoshop performance: making sure we test it correctly, finding the problem spots in the code, tuning the code for optimal performance, researching new techniques, testing new compilers, understanding new processors, teaching my coworkers how to test and tune for performance, etc.

Over the years, I’ve collected a lot of code showing particular problems we’ve found, and solutions we have found. But that code has only been shared inside Adobe or with compiler vendors with whom Adobe has a Non Disclosure Agreement (NDA).

Several months back, I was writing up some notes on the concept of abstraction in computer science and revisited Alex Stepanov’s abstraction penalty benchmark. Alex wrote this benchmark at a time (circa 1994) when C++ compilers weren’t doing a great job of optimizing even basic C++ abstraction. Within a few years of the benchmark’s release, most compiler vendors had identified and fixed their performance problems that the benchmark exposed.

I wondered if the compiler writers had really done the job well or taken shortcuts, and what the penalties would be for using more complex C++ abstraction. So, I asked Alex (who happens to work just down the hall from me) if anyone had updated his benchmark or was doing active research on it and related penalties. Alex said “No, I don’t know of anyone working on that. Why don’t you take it over?”.

Alex then proceeded to convince me that we really need better benchmarks – ones that don’t try to sum up the whole world in one number, but tests that probe specific areas and patterns. They should answer the questions such as: “What is the penalty for using X?” or “Does my compiler perform optimization Y?” Alex argued that such a set of benchmarks, if released as open source (without all the NDA hassles), would benefit all of Adobe’s applications by improving the compilers, and benefit all C++ users the same way. He said that it was almost the same thing I had been doing with our internal code, but with wider exposure, and hopefully a few more people contributing. Of course, then we had to convince my manager — but the idea was good (and backed by several senior researchers who had joined the discussion), so she agreed that I could spend part of my time creating benchmarks.

Now I’ve got a blank slate, a lot of historical code that can’t go out as-is, a long list of complaints about compilers, a longer list of suspicions about compilers, and a lot of things that I’ve heard other people claim or complain about C++ and compilers (much of which I know not to be true).

Where should I start?

Copyright © 2012 Adobe Systems Incorporated. All rights reserved.
Terms of Use | Privacy Policy and Cookies (Updated)