Windows audio APIs

Posted this in another thread - thought it might be good here too:

Just a little tutorial: Since Vista SP1, Windows has two primary audio interfaces, DirectSound and Wasapi (Windows Audio Session Application Programming Interface). The latter was a replacement for XP's Kernal Streaming mode. DirectSound acts as a program-friendly middle layer between the program and the audio driver, which in turn speaks to the audio hardware. With DS, Windows controls the sample rate, channel layout and other details of the audio stream. Every program using sound passes it's data to DS, which then resamples as required so it can mix audio streams from any program together with system sounds. The advantages are that programs don't need resampling code or other complexities, and any program can play sounds at the same time as others, or the same time as system sounds, because they are all mixed to one format. The disadvantages are that other programs can play at the same time, and that a program's output gets mixed to whatever the system's settings are. This means the program cannnot control the sampling rate, channel count, format, etc. Even more important for this thread is that you cannot pass through encoded formats, as DS will not decode them and it would otherwise bit-mangle them, and there is a loss of sonic quality involved in the mixing and resampling. Partly to allow for cleaner, uncompromised or encoded audio, and for low-latency requirements like mixing and recording, MS re-vamped their Kernal Streaming mode from XP and came up with WASAPI. WASAPI itself has two modes, shared and exclusive. Shared mode is in many ways similar to DS, so I won't cover it here. WASAPI exclusive mode bypasses the mixing/resampling layers of DS, and allows the application to negotiate directly with the audio driver what format it wishes to present the data in. This often involves some back-and-forth depending on the format specified and the device's capabilities. Once a format is agreed upon, the application decides how it will present the data stream. The normal manner is in push mode - a buffer is created which the audio device draws from, and the application pushes as much data in as it can to keep that buffer full. To do this it must constantly monitor the levels in the buffer, with short "sleeps" in between to allow other threads to run. WASAPI, and most modern sound devices, also support a "pull" or "event-driven" mode. In this mode two buffers are used. The application gives the audio driver a call-back address or function, fills one buffer and starts playback, then goes off to do other processing. It can forget about the data stream for a while. Whenever one of the two buffers is empty, the audio driver "calls you back", and gives you the address of the empty buffer. You fill this and go your way again. Between the two buffers there is a ping-pong action: one is in use and draining, the other is full and ready. As soon as the first is emptied the buffers are switched, and you are called upon to fill the empty one. So audio data is being "pulled" from the application by the audio driver, as opposed to "pushed" by the application. WASAPI data is passed-through as-is, which is why you must negotiate capabilities with the audio driver (i.e. it must be compatible with the format you want to send it as there is no DS between to convert it), and why encoded formats like DTS can reach the receiver unchanged for decoding there. Because WASAPI performs no mixing or resampling, it is best used in the exclusive mode, and as a result the application gets the exclusive rights to the audio buffers, to the exclusion of all other sounds or players. WASAPI shared mode does allow this, but that's not a common mode and not what we want for an HTPC. I myself have a dislike of Window's cutesy system sounds happening at 110db

Hope some of you found today's primer of use. Please pick up a scorecard from the desk and drop it in the big round "collection box" on your way out

Cheers, Damian