[Thanks to Steve Hoeg for doing the hard parts of this post.]
A lot of people are talking about CUDA and the GPU in the context of Premiere Pro. But the talk is almost always about speed, speed, and more speed. Yeah, using CUDA on the GPU to process a lot of effects and such does speed things up (a lot!) in many cases, but that’s not the whole story.
Moving a lot of processing to the GPU can also make things better, not just faster.
A good example is scaling. There are lots of different scaling algorithms, and they each have their pros and cons. Some are better for scaling things up, some are better for scaling things down; some are better for sharp graphics, and some are better for gradual changes in color across an image. The real tradeoff, though, is that the high-quality algorithms are also—in general—the slow algorithms.
However, these higher-quality algorithms are only really slow if you are forced to execute them serially, but they are relatively fast when you can run them in parallel. One of the huge advantages of GPU processing is that GPUs are massively parallel, with hundreds of parallel processing units. There are a lot of pixel operations that are very amenable to parallel processing, since you don’t need to know the result of the operation on one pixel to do the same operation on its neighbor in the same image. Scaling is just such an operation. When you move scaling operations to the GPU, you get to take advantage of scaling algorithms that were just plain unfeasible on the CPU.
So, scaling using CUDA can be better. And faster. In some tests done here, scaling was more than 40 times faster on the GPU than on the CPU at maximum quality.
When Premiere Pro is just using the CPU for the processing of scaling operations, it uses the following scaling methods:
- playback: bilinear
- paused: Gaussian low-pass sampled with bilinear
- high-quality export (Maximum Render Quality off): Gaussian low-pass sampled with bilinear
- Maximum Render Quality export: variable-radius bicubic
The variable-radius bicubic scaling done on the CPU is very similar to the standard bicubic mode in Photoshop, though the Premiere Pro version is multi-threaded and optimized with some SSE instructions. Even with these optimizations, it is still extremely slow. For high-quality scaling at faster-than-real-time processing, you need to use a CUDA card.
When Premiere Pro is using CUDA on the GPU to accelerate the processing of scaling operations, it uses the following scaling methods:
- playback: bilinear
- paused: Lanczos 2 low-pass sampled with bicubic
- export: Lanczos 2 low-pass sampled with bicubic
For export, scaling with CUDA is always at maximum quality, regardless of quality settings. (This only applies to scaling done on the GPU.) Maximum Render Quality can still make a difference with CUDA-accelerated exports for any parts of the render that are processed on the CPU. Over time, we are working on reducing the list of exceptions to what can be processed on the GPU. For an example of a limitation that can cause some rendering to fall back to the CPU, see this article: “Maxium dimensions in Premiere Pro CS5″.
When rendering is done on the CPU with Maximum Render Quality enabled, processing is done in a linear color space (i.e., gamma = 1.0) at 32 bits per channel (bpc), which results in more realistic results, finer gradations in color, and better results for midtones. CUDA-accelerated processing is always performed in a 32-bpc linear color space. To have results match between CPU rendering and GPU rendering, enable Maximum Render Quality.
Note: There are two places to enable or disable Maxium Render Quality—in the sequence settings and in the export settings. The sequence setting only applies to preview renders; the export setting (which defaults to the sequence setting) overrides the sequence setting.
One final note, as long as I have your attention:
I’ve noticed a lot of people—the vast majority, really—using the term ‘Mercury’ or ‘Mercury playback engine’ as if it refers specifically to CUDA processing. Not true. The term ‘Mercury playback engine’ refers to a whole set of performance improvements in Premiere Pro, including the port to a 64-bit application, the multi-threaded nature of the application, and the use of CUDA on the GPU to accelerate some things. Anyone using Premiere Pro CS5 or later is getting all but one of these advantages; people with certified CUDA cards are getting one additional advantage.