Archive for May, 2007

Welcome To The Jungle

I have been programming multimedia-type stuff on Linux since 1999. I have long observed that there are many, many, many methods of programming audio. By “programming audio”, I generally refer to the act of programmatically taking an array of numbers that represent an audio wave’s discrete amplitude levels over time, and shoving that array out to a digital-to-analog converter (DAC). This is how a modern computer makes noise, cooling systems notwithstanding.

There are 2 primary methods of sending audio data to a DAC under Linux: OSS and ALSA. OSS came first; ALSA supplanted OSS. Despite this, and as stated above, there are numerous different ways to do the DAC send. There are libraries and frameworks that live at a higher level than OSS and ALSA. In the end, they all just send the data out through OSS or ALSA.

The zaniest part is that some of these higher level libraries can call each other, sometimes in a circular manner. Library A supports sending audio through both OSS and ALSA, and library B does the same. But then library A also has a wrapper to send audio through library B and vice versa. For that matter, OSS and ALSA both have emulation layers for each other. I took the time to map out all of the various libraries that I know of that operate on Linux and are capable of nudging that PCM data out to a DAC:


Click for larger image

graph source, if you’re interested

I love you, Graphviz!

The green cell is the audio data’s final destination. The three colored boxes that live one layer removed from that green box depict the subsystems that actually send data to audio hardware. I admit that this exercise educated me about an output system I had not yet heard of — FFADO, for FireWire-based audio devices. All of the remaining boxes depict the higher layered libraries. I’m pretty sure each has been recommended by blog readers as the Flash audio output system.

Methodology: I downloaded the source code for each library contender; grep’d through the filenames for “alsa” or “oss” based on the assumptions that A) these libraries would all support one or the other, and B) that these libraries tend to be modular and this would lead me to where the output modules congregate; investigated the output modules that lived alongside the OSS and/or ALSA modules.

Why would a higher level library be useful? One oft-cited rationale is to enable playing more than one audio stream out of a device that only supports one audio stream. The library will handle the multi-stream mixing in software and send the final wave out to the device. However, ALSA also handles this.

Another major alleged selling point for many of these libraries is the promise of cross-platform compatibility. I.e., code to one of the libraries and it will make noise on lots of systems, not just Linux. Windows and Mac are often on the list, among many others. However, Flash has longstanding and well-debugged platform-specific code for pushing out that array of audio samples for Windows and Mac, so that’s not exactly a concern for this project.

May 12, 2007 Update: Thanks to all the commenters who have helped me recognize that this web is even more tangled that I had first thought possible. Added libao, added more lines from GStreamer’s box to other boxes; I also know of Phonon but I’m not convinced that it’s designed for the simple purpose that Flash Player would need.