Posts in Category "Software Engineering"

September 4, 2011

Speaking once again on Parallelism and Computer Science Education at the Intel Developer Forum

As a hiring manager building teams working on modern computer software; I’ve often been disappointed in the lack of a proper foundation in parallel algorithms and architectures being taught in current Computer Science curricula. To that end, I’ve been working with a group called the Educational Alliance for a Parallel Future that aims to improve Computer Science curricula in this critical area. The EAPF is once again convening a panel of educators and industry representatives to talk about this important issue and once again I am delighted to participate.

The panel is entitled: Parallel Education Status Check – Which Programming Approaches Make the Cut for Parallelism in Undergraduate Education? Unlike previous iterations of this panel where we spoke in generalities, this time we’ll be diving a bit deeper into specific technologies that we think are good starting places for educators to introduce to their students.

Here is an excerpt of the abstract:
The industry and research communities face increasing workforce preparedness challenges in parallel (and distributed) computing, due to today’s ubiquitous multi-/many-core and cloud computing. Underlying the excitement over technical details of the newest platforms is one of the thorniest questions facing educators and practitioners — What languages, libraries, or programming models are best suited to make use of current and future innovations? This panel will confront this conundrum directly through discussions with technical managers and academics from different perspectives. The session is convened by the Educational Alliance for a Parallel Future (EAPF), an organization with wide-ranging industry/academia/research membership, including Intel, ACM, AMD, and other prominent technology corporations.

The panel will be presented on September 15th, 2011 at 10:15am as part of the Intel Developer Forum 2011 at the Moscone Center in San Francisco, California. There are free passes for interested educators. Register now for a free IDF day pass using promo code DCPACN1.

My specific take has always been that I am not as interested in grounding in a specific parallelism library or abstraction. The pace of change in this area has only increased over the last few years with the rise of multi-core, GPGPU, HPC and heterogenous computing. Techniques and libraries have arisen, gained adoption, and fallen out of favor one after another.

A developer who only understands how algorithms can be mapped to OpenMP-style libraries is not as useful once the team moves to Grand Central Dispatch or OpenCL. A grounding in traditional task-level parallelism as well as data-parallelism techniques is a starting point. It is important not only to understand what each of them are but the different types of problems that they are each applicable to.

Higher level abstractions like OpenMP are good for introductory courses. However, it is important to understand fully how high-level abstractions map to lower level implementations and even the hardware itself. Understanding the hardware your software runs on is critical to find the best performance for your code. It is also critical to understanding why one particular higher level library might work better than another for a particular task on specific hardware.

Once you understand things like hyperthreading, pThreads, locking mechanisms, and why OpenCL or CUDA maps really well to specific problems, but not to others, then you can return to using higher level abstractions that let you focus on your algorithm and not the details.

If I was a Dean of Computer Science somewhere, I’d look to creating a curriculum where parallel programming using higher-level abstractions was part of the introductory courses using something like C++11, OpenMP or TBB. Mid-level requirements would include some computer architecture instruction. Specifically, how computer architecture maps to the software that runs on top of it. This may also include some lower level instruction in things like pThreads, Race conditions, lock-free programming or even GPU or heterogenous programming techniques using OpenCL. In later courses focused more on software engineering, specific areas like graphics, or larger projects: I’d encourage the students to use whichever tools they found most appropriate to the tasks at hand. This might even include very high level proprietary abstractions like DirectCompute or C++AMP as long as the students could make the tradeoffs intelligently because of their understanding of the area from previous courses.

Given that the panel consists of representatives from Intel, AMD, Microsoft, Georgia Tech as well as myself, I’m expecting this to be a very spirited conversation. I hope to see you there.

More information:
Paul Steinberg’s blog post about the panel
Ben Gaster’s post 

8:56 AM Permalink
October 8, 2010

My talk from the 2010 GTC Conference

Go here to see the slides on

9:32 PM Permalink
September 28, 2010

Recordings from the Intel Developer Forum

Here is the recording from the panel I was on: Parallelism and Education: Navigating through a Sea of Cores with Dr. Daniel Ernst (University of Winsconsin-Eau Claire), Dr. Ryan Newton (Intel), Dr. Mathew Wold (Intel), Dr. Michael McCool (Intel), and Thomas Murphy (Contra Costa College)

After the Panel session, I was on an episode of Teach Parallel on Intel Software Network TV with Dr. McCool, Professor Murphy and Paul Steinberg.

There were a lot of other good panels on educating developers for the current parallel architectures that are archived on the Intel site.

10:21 PM Permalink
September 26, 2008

CPU, GPU, multi-core

There has been a lot of confusion around Pixel Bender and GPUs in CS4. Admittedly, some of it was caused by me :). I wanted to do a clarifying post about GPU, Pixel Bender, and multi-core and how apps in CS4 do different things.

One thing I wanted to correct is the assumption that GPU = FASTER. I’ve seen this misconception a lot, and I think it is confusing some people. The chips on graphics cards (GPUs) are extremely efficient processors capable of doing lots of math in parallel and have the benefits of fast local memory with a super fast connection to the processor. This makes them ideal for the kinds of things that Pixel Bender does. However, this super-efficient processor is connected to the main computer processor by a not-so-fast connection, the bus. Moving data on and off of the GPU is expensive relative to doing things on the GPU directly. What this means is that if you want to do something on the CPU with some data then do something on the GPU and then use the output of the GPU on the CPU again, it might be more expensive than having just done the whole thing on the CPU in the first place. The overhead of the bus transfers can overwhelm the benefits of the fast GPU computation. The busses are getting faster, and when things will work better in one place vs. another is very different from machine to machine. There are a ton of other details I’m glossing over. I’m just trying to make a central point here: that the GPU is not always faster than the CPU.

Pixel Bender is designed to run very efficiently on the GPU, but that design also allows it to execute extremely efficiently on a multi-core CPU. In Flash Player 10, Pixel Bender does not run on the GPU, it does run multi-threaded and executes really fast, especially on multi-core and multi-processor chips (see Tinic’s post for more info). The Flash team really has done an outstanding job with their JITter and their multi-threading and Pixel Bender runs pretty darn fast on every machine I’ve tried (from a lowly single core based laptop to an 8-core Penryn MacPro).

In After Effects CS4, all the OpenGL effects including the new Cartoon Effect, Turbulent Noise, and Bilateral Blur effects are written in Pixel Bender and can run on the GPU or CPU. When don’t they run on the GPU? When you have a non-GPU effect following them in the effects chain on the layer. In those cases, it isn’t clear if you would have a performance gain by running on the GPU. Cartoon is the exception. The algorithm is complex enough that AE assumes it is always faster on the GPU. All 3rd party Pixel Bender filters run multi-threaded on the CPU. This was an architectural decision.

In the Photoshop plug-in, Pixel Bender filters always run on the GPU if you have a graphics card that is supported by CS4. In other cases, the filters run multi-core. The new canvas rotate-pan-and-zoom and the gigantor image support are all done using the GPU. John Nack has lots of details on his blog. One thing I wanted to correct about Photoshop CS4: it is not using CUDA. Not sure how this rumour got out there, but it isn’t true. Not that we aren’t fans of CUDA, we just aren’t shipping anything that uses it in CS4.

There are other apps in CS4 with GPU support, but I wanted to keep this post to the ones that support Pixel Bender, just to clear up the confusion.

5:40 PM Permalink
November 22, 2005

String encodings

I thought I was pretty up on localization. I understand locales and code pages. I know the difference between Unicode and ASCII. My strings are stored as resources. My UIs are dynamic enough to resize themselves to fit various languages. I have released several products for multiple world markets… These were all on the Windows Platform, however.

When I came to Adobe and started doing cross platform development, it really woke me up. This kick to the head came when I started having to deal with string encodings.

If you know windows development, you will know MBCS and Unicode (UCS-2). Most windows developers I’ve met are unaware of any other string encodings. They have the rule in their heads that you use CHAR * for English-only apps and WCHAR * for international ones. However, there are several standard string encodings as well (MBCS is not a standard, it is a Microsoft-only thing that can really cause problems if you are doing xPlatform dev work). UTF-8 is the best known since XML and HTML documents are encoded as UTF-8 by default. UTF-8, like MBCS, can represent a single character in multiple bytes (which is why you need to be very careful if you think a CHAR represents a character and can be treated like such). We tend to use UTF-8 encodings for our strings since it is the closest thing to a cross-platform string encoding OS support-wise.

Microsoft Unicode (UCS-2) represents each character as two bytes and only two bytes, therefore you can do all the nice character comparison tricks you want to. Java uses UCS-2 as its standard encoding.

Unfortunately, UCS-2 cannot represent all the characters that are used in the world. There are some characters in Chinese and other languages that cannot be encoded with the fixed-byte system of UCS-2. UCS-2 is a variant and subset of UTF-16. In UTF-16 character is encoded by a minimum of two bytes, but a single characters encoding can stretch to 4 bytes or more (although not really more since Unicode is currently capped out at 21 bits).

Some developers I work with are now using UTF-32 encodings for simplicity, but someday we may end up with UTF-64 or even UTF-128 (maybe after the aliens conquer us or every atom in the universe will have its own character).

There is a pretty good Unicode Tutorial over at jBrowse which I wished I’d found when I was first learning about this stuff.

[Updated this blog to reflect a discussion with my co-worker Bob, thanks Bob!]

3:49 PM Permalink