Windows audio APIs

From Official Kodi Wiki
Revision as of 02:58, 17 December 2021 by RogueScholar (talk | contribs) (Clean up page and make it read like a wiki article with appropriate outlinks for technical terms)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Home icon grey.png   ▶ Development ▶ Windows audio APIs

Since the release of Windows Vista Service Pack 1, Windows has a single, primary audio interface: WASAPI (Windows Audio Session API).The various audio interfaces that were used by the Xbox, Windows XP and earlier versions (DirectSound, XAudio2, et al.) exist since then only as emulated session instances atop WASAPI, formally deprecated despite continuing widespread use and with no direct communication to the audio drivers. This consolidation was part of Microsoft's adoption of the Universal Audio Architecture audio standards initiative.

1 DirectSound

DirectSound (DS) acts as a program-friendly middle layer between the program and the audio driver, which in turn speaks to the audio hardware. With DS, Windows controls the sample rate, channel layout and other details of the audio stream. Every program using sound passes it's data to DS, which then resamples as required so it can mix audio streams from any program together with system sounds.

1.1 Pros and cons

The advantages are that programs don't need resampling code or other complexities, and any program can play sounds at the same time as others, or the same time as system sounds, because they are all mixed to a single format. The disadvantages are that other programs can play at the same time, and that every application's output gets mixed to whatever the system's audio output settings are. This means the program cannot control the sampling rate, channel count, format, etc. Even more important as this regards Kodi is that there is no way to passthrough encoded formats, as DS will not decode them but bit-mangle them, and there is added latency and a loss of sonic quality involved in the mixing and resampling process.


Partly to allow for cleaner, uncompromised or encoded audio, and for low-latency requirements like mixing and recording, Microsoft reengineered the Kernel Streaming mode from Windows XP and the result was WASAPI, which solved these issues by offering two different modes: shared and exclusive. Shared mode is in many ways similar to DS, while exclusive mode bypasses the mixing/resampling layers and allows the application to negotiate directly with the audio driver what format it wishes to present the data in. This often involves an initial handshake and negotiation to compare the available and supported formats and bit depths, depending on the format specified and the device's capabilities. Once a format is agreed upon, the application decides how it will present the data stream.

2.1 Push and pull modes

The more common approach is a "push" relationship, where a buffer is created which the audio device draws from, and the application attempts to fill it with sufficient data to keep that buffer full. To do this, it must constantly monitor the buffer's contents, with short "sleeps" in between to allow other threads to run.

WASAPI, and most modern sound devices, also support a "pull" or "event-driven" mode where two buffers are used. The application gives the audio driver a call-back address or function, fills one buffer and starts playback, then goes off to do other processing. It can now forget about the data stream for a while. Whenever one of the two buffers is empty, the audio driver "calls you back" with the address of the empty buffer, which you fill and go on your way again. Between the two buffers there is a ping-pong action: at any point in time only one is in use and draining while the other is full or being filled. The handoff between the states is triggered each time one is fully emptied and you are called back to fill the empty one. This describes in simple terms the difference between the push and pull relationships: in one the audio data is being "pulled" by the audio driver, and in the other it is "pushed" by the application.

WASAPI data is passed through "as-is," which is why you must negotiate capabilities with the audio driver (i.e. it must be compatible with the format you want to send it as there is no DS mixer in between to convert it), and why encoded formats like DTS can reach the receiver unchanged for decoding there.

Because WASAPI performs no mixing or resampling, it is best used by Kodi in the exclusive mode, where it receives exclusive rights to the audio buffers and mutes all other sounds or players. WASAPI shared mode does allow this as well, but it's a less common mode and not ideal for an HTPC; many users find they have an intense dislike of the Windows system event sounds being mixed into the audio stream when it's at 110db during media playback.

3 See also