Welcome To The Jungle

I have been programming multimedia-type stuff on Linux since 1999. I have long observed that there are many, many, many methods of programming audio. By “programming audio”, I generally refer to the act of programmatically taking an array of numbers that represent an audio wave’s discrete amplitude levels over time, and shoving that array out to a digital-to-analog converter (DAC). This is how a modern computer makes noise, cooling systems notwithstanding.

There are 2 primary methods of sending audio data to a DAC under Linux: OSS and ALSA. OSS came first; ALSA supplanted OSS. Despite this, and as stated above, there are numerous different ways to do the DAC send. There are libraries and frameworks that live at a higher level than OSS and ALSA. In the end, they all just send the data out through OSS or ALSA.

The zaniest part is that some of these higher level libraries can call each other, sometimes in a circular manner. Library A supports sending audio through both OSS and ALSA, and library B does the same. But then library A also has a wrapper to send audio through library B and vice versa. For that matter, OSS and ALSA both have emulation layers for each other. I took the time to map out all of the various libraries that I know of that operate on Linux and are capable of nudging that PCM data out to a DAC:

linuxaudio.png

Click for larger image


graph source, if you’re interested

I love you, Graphviz!

The green cell is the audio data’s final destination. The three colored boxes that live one layer removed from that green box depict the subsystems that actually send data to audio hardware. I admit that this exercise educated me about an output system I had not yet heard of — FFADO, for FireWire-based audio devices. All of the remaining boxes depict the higher layered libraries. I’m pretty sure each has been recommended by blog readers as the Flash audio output system.

Methodology: I downloaded the source code for each library contender; grep’d through the filenames for “alsa” or “oss” based on the assumptions that A) these libraries would all support one or the other, and B) that these libraries tend to be modular and this would lead me to where the output modules congregate; investigated the output modules that lived alongside the OSS and/or ALSA modules.

Why would a higher level library be useful? One oft-cited rationale is to enable playing more than one audio stream out of a device that only supports one audio stream. The library will handle the multi-stream mixing in software and send the final wave out to the device. However, ALSA also handles this.

Another major alleged selling point for many of these libraries is the promise of cross-platform compatibility. I.e., code to one of the libraries and it will make noise on lots of systems, not just Linux. Windows and Mac are often on the list, among many others. However, Flash has longstanding and well-debugged platform-specific code for pushing out that array of audio samples for Windows and Mac, so that’s not exactly a concern for this project.

May 12, 2007 Update: Thanks to all the commenters who have helped me recognize that this web is even more tangled that I had first thought possible. Added libao, added more lines from GStreamer’s box to other boxes; I also know of Phonon but I’m not convinced that it’s designed for the simple purpose that Flash Player would need.

34 Responses to Welcome To The Jungle

  1. abe says:

    how about gstreamer! I know you hated it a few years ago, but it’s quite good now– I’m using 0.10 in Ubuntu Feisty Fawn. Lots of non-linear editors use it with gnonlin too.The reason I ask is because I have two sound cards (an onboard and a soundblaster live) and since flash calls on alsa directly and doesn’t use gstreamer, it cannot detect which soundcard I’ve set my machine to use.see this launchpad bug: https://bugs.launchpad.net/bugs/109485

  2. Christian says:

    Do you know Phonon? It’s the KDE answer for that jungle. It’s included in the just released KDE4 Alpha1.http://phonon.kde.org/ [ I should have mentioned Phonon. I know of it but I couldn’t find any source code downloads or any documentation on which output libraries it supports. -Mike M. ]

  3. Zbigniew L. says:

    OpenAL can talk directly to hardware. Hardware accelerated OpenAL is present on the following Linux hardware:*Nvidia nForce2 MCP-T (Nvidia proprietary OpenAL 1.0 library in closed source nForce Linux driver, full hardware acceleration)*Creative SB X-FI (hardware accelerated proprietary OpenAL 1.1+EAX library for Linux will be available since late autumn 2007)*Creative SB Live!/Audigy hw accelerated, open source OpenAL library (uses ALSA engine and special EMU10K DSP driver capabilities http://www.lost.org.uk/openal.htmlno other audio card/chip can do OpenAL in hardware in Linux. [ I looked into it but the OpenAL hardware acceleration appears to be an extension to ALSA. -Mike M. ]

  4. Anonymous says:

    Creative talks about future Linux hw OpenAL+EAX here:http://opensource.creative.com/The OpenAL library from Nvidia has file name: libnvopenal.aand is present in older closed source nForce driver pack:NFORCE-Linux-x86-1.0-0310-pkg1.run

  5. Moe says:

    Wow, I can’t believe how many morons are wondering these vast nets of the intarweb.Yet another round of recommendations is exactly what he was trying to prevent from happening I suppose .. and yet, you’re all here again prompting “Yet, framework A works absolutely flawlessly over here (on Ubuntu ShoveItUpYourAss)!” and “Framework B superseeds framework A”.Give that poor soul a break guys! He’s been through that and more. I can’t believe how stressful it might be to bake a cake that kind of suits everyone in _the whole linux community_.Moreover, as he stated: ALSA is all you need for basic audio playback (DMIX anyone?). And thats all flash player needs. No mixing, no disorting .. no other frills. And especially no affiliation with the outside world that involved koo-koo pants like you guys.PS: Gstreamer guy .. have you ever seriously considering configuring ALSA the right way? You might want to look into the magic you can do with asoundrc files (Hint: Just google!)

  6. Anon says:

    OSS should be considered deprecated, so that should get off the consideration, then the Gstreamer+PulseAudio is pretty bloated for embedded systems, aRts will be dropped after kde3 generation, by dropping those only reasonable way to me via either:1. OpenAL-ALSA2. OpenAL-SDL-ALSA3. GStreamer-ALSA4. GStreamer-PulseAudio-*ordered from best to worst (* = all possible tests exluding the OSS).Ps. Obviously I prefer OpenAL over all other choices… Allegro and ClanLib routes suck big time on code quality/bloat.

  7. Mike:You can find all the source and information about Phonon in http://www.kde.org. KDE 4.0 alpha 1 was released yesterday, with packages for several distributions. There is more info at http://phonon.kde.org and the Phonon project leader, Matthias Kretz (AKA Vir) can be found in #phonon @ freenode.Phonon has several backends: GStreamer, Xine, Network Multimedia Middleware, CoreAudio/CoreVideo (Mac OS X) and a will also have a DirectX backend (for Windows) [ Thanks. I found the Phonon source in the SVN repository. But I’m still not dignifying it on the updated graph because I’m not convinced that it can serve as a simple audio output mechanism. -Mike M. ]

  8. What is all this FUD about?There once was OSS. Linux abandoned it for ALSA.Like you said, there are numerous libraries that overcome this gap in a platform neutral way. You can staticly compile them into the flash plugin.Pick you choise and quit whining.

  9. gregj says:

    phonon lives in kde source code only. so look for it there.

  10. I believe you’re missing two lines for Gstreamer:It can output to ESD and I believe even OSS directly.Thanks for the excellent chart! Can you post the source to it? [ Thanks, I missed those. Graph updated and source posted below the thumbnail. -Mike M. ]

  11. Kevin Krammer says:

    The “jungle” is a bit less confusing if one takes the different layers of responsibility into account.ALSA and OSS are the driver frameworks, more or less just concerned with getting user space data into the hardware.PulseAudio (I think PortAudio is just an older name of the same thing), ESD, Jack, NAS and to some extend aRts are audio routing. They are not strictly necessary, though the allow a greater flexibility of how the user can make up the sound path. Good example would be sending it through some effect process, or over a network connection.The others (and to some extend aRts again) are more or less the audio producers, transforming whatever the actual data is formatted in into the simple for the lower layers can handle.Since the routing layer is mostly optional, the producer layer usually has options to talking to the drivers directly.Since developers of frameworks on the same layer tend to be interested in some kind of interoperability, they have connections in between them, though those are even less likely being needed than the other type of connections.Since Flash itself seems to be able to decode audio data into the basic format, it is more or less on the same layer as GStreamer and friends.It doesn’t have to support a routing layer method since this is optional, though it might be a nice thing to have for thin-client scenarios or terminal servers.As for the recommendation based on cross-platform capabilty:people do this from an ideal world point of view, i.e. assuming that the developer/software vendor wants to focus their resources on domain specific problems rather than solving platform specific ones.However, this works only for software that does not have a large legacy/internal codebase to build upon, e.g. a totally new product.As you said yourself, Flash already has platform adapters for some platforms, so they have to maintaine anyway and delegating to a new cross-platform framework would also require quite some work.Still, it might be an option to consider for the “Linux” adapter, because there can be several different Unix variants under the X11 window system and doing a platform adapter for each combination might be a little bit too much work. [ About PulseAudio and PortAudio, PulseAudio was previously known as Polypaudio rather than PortAudio, which is a distinct project. -Mike M. ]

  12. Dark Phoenix says:

    > OSS should be considered deprecated, so that should get off the consideration,OSS is only “deprecated” in Linux; *BSD still uses it to handle audio, and I believe Solaris also uses it to some extent. I know this’ll bring the long lost of “who cares about BSD/Solaris” posts, but seriously, most of Flash’s other requirements work on BSD/Solaris just as well as Linux, and if the proper sound framework were used, then it could work on most of the UNIX-like OS’es.In fact, this is generally why libraries like GStreamer and Phonon and ESD are pushed as sound solutions. They abstract the difference in sound systems between the UNIX-like OS’es (in fact, I imagine Adobe’s already seen this in coding sound for both Linux and MacOSX, both of which are UNIX-like but use different sound systems). And in fact, the main reason the various sound systems can be hooked into each other is so it becomes possible to designate one as the main sound system and route all the others through it (I route most of my sound stuff through ESound, including systems using Arts and systems using SDL).

  13. Gravis says:

    i have no problem with the current ALSA output style. why is it being changed again? anyway, you may have not consider a nice little library called libao that can detect and use what is availible. the one drawback is that you cant go ultra-nitty-gritty on the output which shouldnt be a problem. libao is used in MPlayer.’libao: a cross platform audio library”http://xiph.org/ao/ [ We’re not changing from ALSA. Thanks for the reminder about libao– I had completely forgotten about that one. -Mike M. ]

  14. ALSA has always been able to emulate OSS. Latest version of OSS is able to emulate ALSA.

  15. Zbigniew L. says:

    Say what you want. The truth is: All this sound mess some day will end up as OpenAL.One API for everything. Hardware designers have choice to use full software library for onboard codecs or provide transparent hw acceleration like in Nvidia MCP-T. The same is with OpenGL for graphics.Everyone is happy:-manufacturer writes one OpenAL/GL/ML library and then recompiles it for Linux/Windows/other OS-application programmer uses one API for all OS, no big code rewrite needed-user just installs OpenAL library or uses manufacturer custom hw accelerated OpenAL-manufacturers can use extensions to provide additional capabilities keeping basic caps compatible-programmers can use extensions or notEveryone is happy and saves money or time. However the World have to grow up to see this easy solution.The media future of Open source OSes is OpenAL/GL/ML for media content. [ But what about legacy distribution installations? They still need support. -Mike M. ]

  16. dacer says:

    surely, there are a lot of experts that can help you with this web.But they need GPL licence on the code.

  17. Skc says:

    You said nothing about network features.Exporting display is easy on Linux thanks to the X-protocol; but for sound, it’s a nightmare. KDE and Gnome each have their own solutions. But every applications like flash-player… may cause additional problems.

  18. Vincenzo says:

    I think that PortAudio is the library that meet FlashPlayer requirements. It’s cross platform (work with oss,alsa,jack, asio,coreaudio and so on), it’s small, have a small number of dependency, you can use it in commercial application and it’s mature (see Audacity, Wired and commercial application).

  19. jbus says:

    What’s going on with Apollo support on Linux???Who is the developer in charge of Linux Apollo support? It feels like a situation similar to the whole Linux Flash fiasco brewing all over again with Apollo.

  20. Steve says:

    “However, Flash has longstanding and well-debugged platform-specific code”Is that the code that segfaults, takes down my browser, and hasn’t yet been written for 64-bit?I think there’s a reason that the rest of the world is tending AWAY from platform-specific code.

  21. Hi Mike,Considering you’ve waded through the Linux Sound Jungle, perhaps you’d like to take a look at this bug report currently sitting in Ubuntu:https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/104470Many users, myself included, are having issues where closing a Firefox window with a Flash player running and utilizing sound causes the whole of firefox to crash, with the only recourse being kill -9 `pidof firefox-bin`.Perhaps this has something to do with our particular sound chips (snd-hda-intel seems to be a culprit), or interactions of the libraries in this jungle?Just giving you the heads up.Andrew

  22. Bill says:

    While all you gents are pontificating over which interface to use and which legacy (what, on Linux?) sound interface to support, users are running for their lives away from Linux. Why, because it is so fragmented that only a developer can figure out how to use it. Further, such fragmentation makes adding a new application virtually impossible without a major research project. Has anyone ever considered the notion that an operating system is only useful as a standard interface from which applications can be easily installed. Does this fit Linux, audio in particular? Read the discussion above and note the miriad of major audio platforms. Likely none of them are totally satisfactory because they duplicate and emulate each other yet fail to deliver the functionality that is needed in the first place. They do accomplish the goal of totally bewildering “newbies,” and provide fodder for any number of misdirected guides of assistance to those brave enough to attempt to wade through the morass and actually get a Linux application to do something. All this could be summarized in a quick reflection of KISS, and before one basks in the presense of the next great tweak, please consider the bigger picture and think of whether the poor user has any hope of ever appreciating just a simple, well functioning operating system.

  23. new says:

    I believe you’re missing two lines for Gstreamer:It can output to ESD and I believe even OSS directly.Thanks for the excellent chart! Can you post the source to it?

  24. falde says:

    ALSA is probably best for local audio and im currently evaluting portaudio for remote audio. esd, oss, nas and aRts seam to be all dead. Both redhat and ubuntu is focusing on portaudio so my guess is that alsa+portaudio is the future.As there is a portaudio plugin for alsa there would be no need to make any app from alsa to portaudio. The only reason would be if alsa is discontinued somewhere in the future but if that happens the porting can be made then.

  25. Anonymous says:

    If I’m not mistaken, you can compile ESD with ALSA support. May want to update the graph.

  26. nacho says:

    Phonon is a way of abstracting the abstraction. This will never be used outside de kde world.Gstreamer is more than a simple abstraction layer and much more powerful. But since audio production is already implemented in Flash, it also doesn’t make sense to put it on top of gstreamer.What does really make sense is a PuseAudio output for the sound. I sould become an standard sound daemon in Linux. PuseAudio provides much more flexibility than just stream mixing.But Alsa used directly is also a good choice.There’re a lot of audio libraries, and this provides a lot of flexibility for one’s needs. But it makes sound configuration on linux a real pain. Abstracting the abstraction is just a bad idea, though.

  27. James says:

    And PulseAudio emulates ESD, OSS and ALSA, and there are plugins to act as a sink for Jack and GStreamer so the arrows should go both ways. It also has Solaris, Windows and FreeBSD support and can move sound across the network.Not that this will help you in any way, but it makes the diagram more complex 😛

  28. Sitsofe says:

    The jungle grows! I believe that SDL has an ESD backend ( http://packages.debian.org/stable/libs/libsdl1.2debian-esd ). libao will apparently output to NAS ( http://xiph.org/ao/ ).My understanding is that some backends are starting to lose favour with non-legacy distro software (ESD is sort of on the wane but does seem to still have close ties with GNOME, aRts is going to be superseded by Phonon, on Linux the ALSA folks are trying to push out OSS). The big sticking points seem to be legacy distributions and network sound.Oh and thanks for switching Linux Flash to ALSA from esd. It made things much better on the newer systems that were being rolled out (and the latency was vastly improved over esd!).(Great blog, please keep posting stuff like this)

  29. Sitsofe says:

    Here’s an Ottawa Linux Symposium PDF paper talking about Cleaning up the Linux desktop audio mess. It only seems to concentrate on one particular “layer” (no mention of OpenAL, Allegro, GStreamer, libao, SDL, NAS, Clanlib or FFADO) and it can feel a bit like a PulseAudio propaganda piece in places. However it’s worth a skim and it was interesting to find out that there’s a PulseAudio Adobe Flash plugin.

  30. JonhA says:

    Pffft you think that is a jungle … haven’t seen much of Linux yet … Welcome to the jungle.

  31. Funklord says:

    I’ve longed for better audio support in Linux ever since the terrible _mess_ of OSS.Which has made me follow the progression of different audio libs/servers quite closely.OSS: deprecatedALSA: direct driver (should always be optional, since this is the only way to talk directly to Linux with minimum amount of middleware)GStreamer: unclear goalslibao: middleware libs like this and gstreamer are only suitable for blocked I/O, ie. very high latency. the real benefits are questionableopenAL: supported by some hw manufacturers, I still don’t think it will ever catch on since it has nothing new to offer.aRts: impopular/deprecatedESD: impopular/deprecatedJack: Best latency behaviour, network, and easy/logical to set up, has a lot of application support.Pulseaudio: Nice community, and has a lot of features built in, but is difficult to set up with cryptic terminology, and low-latency behaviour is questionable. (this may change since it’s relatively new)In my opinion it is imperative to also have support for at least one of the low latency audio servers such as jack or pulseaudio. since these are the only ones to bridge the gap into environments that many people use.Such as:- *BSD/9P/windows/other users- people who need to route their sound outside a single machine- sound may need to be processed due to specialty sound equipment or faults in them.(For example I _need_ to process sound before sending it to a pair of bluetooth headphones, since there will be a very annoying noise otherwise.)And no, ALSA is not designed nor suitable for any of these tasks.

  32. Sean M. says:

    One thing you might want to add to the jungle is a bunch of arrows coming *out* of ALSA and back in to the other libraries. Although you get diminishing returns and less likelihood of a successful audio path as you increase the number of “hops” your sound has to take (not to mention latency), it IS possible to do something heinous like this, thanks to the advent of ALSA plugins that wax your floor:client app –> ALSA pulse plugin –> pulseaudio daemon –> JACK –> ALSA oss plugin –> NAS –> libaudiooss –> aoss –> ALSA hw. Whew!! Obviously, the sooner you break the chain and cut to the hw, the better it’ll be, but chains like this may be necessary when e.g. you need network transparency and your client application is only written to one of the interfaces.

  33. espipi says:

    And PulseAudio emulates ESD, OSS and ALSA, and there are plugins to act as a sink for Jack and GStreamer so the arrows should go both ways. It also has Solaris, Windows and FreeBSD support and can move sound across the network.Not that this will help you in any way, but it makes the diagram more complex 😛

  34. Michael TD Nelson says:

    SO, I’ve read this discussion, and I have a question: is there any way I can get flash web content to output audio through JACK? If it’s possible, that would be great!