In this post I would like to give a short overview of WebKit’s performance and memory testing framework. Along with a bunch of WebKit geeks, I have been involved in the development process for a while, mostly from the side of contributions to memory-measurement.
Let’s distinguish between online benchmarks for browser-level performance (e.g. Chromium performance testing) and the engine-level performance tests. Engine-level performance tests do not test the browser or the underlying platform’s graphical performance, they intend to test a part of a specific component of the web-engine (e.g. parse time of the HTML parser or runtime of the layout-system for floating containers). Browser-level performance tests usually do complex things that most likely try to use more components of the web-engine. Usually the goal is with these tests to reproduce some kind of real browsing behavior to measure the browser’s response time. This post is only about the engine-level testing.
Engine Level Performance Tests have been in WebKit trunk for more than a year now. We improved the system a lot in the last year (e.g. we added support for memory measurements in WebKit Bug #78984), so I think it is time to blog about it to a wider audience. Although work on the system is still in progress, it’s already capable both of demonstrating improvements or catching both performance and memory regressions.
In the first part of this entry, I want to give you a short introduction to the system and a short description how you can use the testing infrastructure. In the second part I intend to talk about the continuous performance measuring system and its online visualizer - the performance-test website. If you follow carefully I’m sure you get some information about our super-exciting future plans!
What is a performance test in WebKit?
Our performance tests measure the run-time performance and the memory usage of WebKit. In all the test cases we start the measurement before the concrete test (the function, the animation, the page loading) starts and stop it when it finishes. Although measuring performance and memory sound pretty straightforward, it can be deceptively difficult. For example, what do we mean by run-time performance? The right answer depends on the test itself. The animation performance-tests produce frame per second values (fps). The tests that measure the runtime of different JS functions (DOM manipulation, page loading, etc.) can produce either milliseconds (ms) or runs per second (run/s). If we really want to understand the meaning of a specific set of performance measurements, we need to look deeper at the actual test cases, but that’s not goal of this post. C’mon, this is just an introduction!
How to run performance tests in WebKit?
It’s time do some actual experiments! We have a script to run performance tests, it is located under the trunk/Tools/Scripts directory called run-perf-tests. I assume you have a WebKit build (if you don’t have a WebKit build, there is a documentation how to set one up), so to run all performance tests (located under trunk/PerformanceTests) you only need to run that script (different ports require additional parameters, for the details check out the –platform parameter). Because running all the tests can take long time, you can restrict the number of the tests run by specifying some directories or a list of tests as a parameter for run-perf-tests.
The script produces pretty straightforward output:
Running Bindings/set-attribute.html (18 of 115) DESCRIPTION: This benchmark covers 'setAttribute' in Dromaeo/dom-attr.html and other DOM methods that return an undefined. RESULT Bindings: set-attribute= 670.696533834 runs/s median= 670.967741935 runs/s, stdev= 4.72565174943 runs/s, min= 663.265306122 runs/s, max= 677.966101695 runs/s RESULT Bindings: set-attribute: JSHeap= 57804.8 bytes median= 57832.0 bytes, stdev= 1283.97383151 bytes, min= 54112.0 bytes, max= 59296.0 bytes RESULT Bindings: set-attribute: Malloc= 1574148.0 bytes median= 1572772.0 bytes, stdev= 3346.56713349 bytes, min= 1568992.0 bytes, max= 1584504.0 bytes Finished: 16.537399 s
The output contains all the things that we are interested in, like test names, descriptions (if they exist) and the performance and memory results. By default after the warm-up run, we run each test 20 times (DumpRenderTree / WebKitTestRunner) to provide a stable result.
After the script has finished the testing, in addition to the console output we get a nice HTML based table with all of our results. The results in the table show the average values and their deviation for each test. If the result is not stable enough we also get a nice sign.
It’s possible to switch between the Time and the Memory view and reorder the results. If we are interested in a specific test, we can select it to see more details, as the screenshot below.
How to compare performance results?
Let’s see a simple workflow: we have a clean WebKit repository with a build. We run all the performance-tests and we want to compare the results with our modified WebKit. Since we store all the performance results by default in the WebKitBuild/Release/PerformanceTestsResults.json file, it’s very simple to compare the results… You just need to apply your patch to the repository, rebuild WebKit and run performance-tests again. After the rerun, the results HTML page contains a new column and shows all the new and old results in the same table. You can repeat the measurement as many times as you wish, each run will add a new column to your results. Additionally, you can use compare different repositories or specify another file for storing the results. You can find additional details if you run the run-perf-tests –help command.
This way you can easily test the effect of your changes. Furthermore the results table will inform you about your improvements and regressions in a simple and clear way.
Continuous performance and memory results online
You can find several useful pieces of information in the chart above. We always want to know the name of the test, the port that we have tested, the difference to the previous measurement, the SVN revision and the date of the measurement. All of this information is located on a nice zoomable chart. Take some time to play with the charts, I’m pretty sure that you will also find it useful. We are planning to update this system to a newer version soon, so more useful features are coming!
Looking into the future of performance measurements
After we further stabilize the continuous performance testing system (WebKit Bug #77037) and we can identify a well-defined set of tests with low deviation and meaningful results (WebKit Bug #105003), we are planning to add a performance-testing Early Warning System support to Bugzilla. The system will report the possible performance and memory regressions for the uploaded patches automatically in a Bugzilla comment, so it will work just like the build/layout Early Warning System. It will help to improve the quality of our developments in WebKit a lot. It sounds cool, doesn’t it?
I hope you enjoyed this little introduction to the WebKit Performance Test system and its online visualizer, and that you will try some of these tools.