Print Page - How does OpenMPT's audio pipeline work?

Title: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on August 28, 2019, 23:05:33

I'm working on a new cross-platform tracker in Rust, using SDL to output audio. Currently, its audio pipeline is loosely based off 0CC-FamiTracker's, which I spent weeks studying the source code attempting to understand (link to Google Doc (https://docs.google.com/document/d/1TnDWCuFXqN0POK66_PHzaRT-HJMCb-H5zFSXNSWtu3Q/edit)). Does anyone want to discuss how OpenMPT's audio pipeline works, and is it worth cloning? (As a user, I only have 0CC-FT experience, not OpenMPT or others.)

0CC's engine, like the NES native driver, runs a fixed number of times a second. These "engine frames" are usually synchronized with vblank, but can be changed.
My tracker will also have the property that "all notes are quantized to "engine frames", and delay effects will be an integer number of engine frames.
0CC's audio pipeline runs on a "synth thread". It renders an entire "engine frame" of audio at once, then copies it 1 sample at a time to a single-threaded circular buffer. Whenever it's full, it pushes the entire buffer to a circular queue.
0CC maintains a fairly large circular queue, but divides it into "pages", each the size of the single-threaded circular buffer (unsure if true). The synth thread writes to this queue one page at a time, and DirectSound (windows-only) reads this queue at its own pace (not separated into pages, I think?).

Is this a good design? One noticeable flaw is that interesting things (no audio output, repeated blocks) happen if the size of the circular buffer is too small (below 30-ish milliseconds).

How does OpenMPT's audio pipeline differ? How much audio does it render at a time? Do different output device types have significantly different pipelines? Is OpenMPT worth copying?

Title: Re: How does OpenMPT's audio pipeline work?
Post by: Saga Musix on August 29, 2019, 17:34:18

First off: I don't think SDL is necessarily a good idea for audio output. It has very little configuration options, which can be vital for high-quality low-latency audio playback. A library like PortAudio or RtAudio may me more suitable if you don't want to roll the low-level API code yourself. OpenMPT uses PortAudio (and optionally RtAudio, mostly for its Wine support) which some custom patches to iron out some bugs, plus custom ASIO/DirectSound/WaveOut implementations. For a modern tracker, you will probably only care about ASIO, WASAPI and WaveRT if you're on Windows. DirectSound is only an emulation layer on top of WASAPI since Windows Vista.

How audio threading works also largely depends on the API you are using; audio APIs are either push APIs or pull APIs, so either you have to actively and regularly feed them with audio data from your own thread, or they ask you to deliver a specific amount of data in their own thread. In either case, OpenMPT renders a variable amount of audio data directly in one of those two threads. Whether or not that's a good idea very much depends on the rest of the architecture, I'd say. You have to consider that many choices in OpenMPT are historical ones and it could very well be that some things should be done differently these days, e.g. having the actual rendering happen in a separate thread.

Now regarding the tracker engine itself:
Whether anything OpenMPT does or what your bullet points describe is a good idea very much depends on what your goal is. Should it work exactly like an oldskool tracker but not be a direct clone (i.e. building on top of existing formats)? Then my next question would be: Why?
If you want a modern tracker: Scrap the idea of having a low amount of ticks ("engine frames") per second. Offer much higher granularity. For example, one approach could be to divide every row into 256 ticks for fine-grained delays, and not make this amount variable. This combined with how OpenMPT's modern tempo mode works would offer very flexible timing and more understandable effect behaviour. Generally I would say that a modern implementation that doesn't have to support legacy formats should have envelopes / slides on a per-frame (not engine frames, but audio frames) basis, i.e. do not increment/decrement volume on every engine frame but on every audio frame (sampling point). CPUs are powerful enough for that these days. OpenMPT doesn't do that as it's mostly building on top of legacy formats, but I'd want to have this kind of granularity in the future at least for its own MPTM format. It's not very easy to have all of that in the same engine though, hence you should choose wisely before even starting writing a single line of code and be sure what kind of tracker you want to build.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on August 29, 2019, 22:10:28

I intend to produce a new format which allows placing notes at rational-fractions of a "beat", with no fixed "row" duration. This matches neither MIDI nor tracker paradigms, but I have implemented this in a format which compiles to MML, and I found it useful and easy to understand.

However my tracker is intended to compile to the NSF format (NES/Famicom music), where the driver only runs once every vblank or "engine frame" (but can be customized using delay loops instead of vblank). During the compilation process, all gaps between notes will be precomputed into a MIDI/PPMCK (note, delay) stream. My sequencer will probably allow users to move notes later or earlier in increments of "engine frames".

Thanks for warning me about SDL. ~~Is SFML (which has a Rust wrapper) good at low-latency audio playback? It has some 3D-positioned audio that I definitely don't need.~~
Seems https://github.com/RustAudio/rust-portaudio exists, and I may look into it. Don't see any RtAudio Rust wrappers.
https://github.com/RustAudio/cpal hmmmmm
Is outputting to JackAudio a good choice? Too Linux-centric? Unnecessary for monolithic programs where I don't need an audio routing graph?

I've tried OpenMPT on Wine a few months back, and the "alsa passthrough" was impossible to prevent from stuttering (I think PulseAudio was running), whereas Wine playback was smooth.

(Not sure how much of this message is worth responding to.)

Title: Re: How does OpenMPT's audio pipeline work?
Post by: manx on August 30, 2019, 07:06:44

Quote from: nyanpasu64 on August 29, 2019, 22:10:28
Thanks for warning me about SDL.

Well, even with all its quirks, SDL should be fine to get you started. Its (default) callback-based paradigm is suitable for music production applications (less so for games, which is awkward, given SDLs mission). Locking is kind of strange and inflexible though. Avoid the newer non-callback based API (QueueAudio), because it introduces yet another buffering layer internally in SDL.

Quote from: nyanpasu64 on August 29, 2019, 22:10:28
Seems https://github.com/RustAudio/rust-portaudio exists, and I may look into it. Don't see any RtAudio Rust wrappers.

PortAudio has a severe limitation on modern Linux systems in the fact that it does not feature a native PulseAudio backend. PulseAudio's ALSA emulation is fragile at best. A major part of that is due to the sheer complexity and awkwardness of the ALSA API. Sadly, this is the single most important reason for PulseAudio's bad reputation.

Quote from: nyanpasu64 on August 29, 2019, 22:10:28
https://github.com/RustAudio/cpal hmmmmm

Look fine, even though choice of API backend on Linux is questionable. Writing anything on plain ALSA is a bad decision since PulseAudio exists and is default on any major distribution.

Quote from: nyanpasu64 on August 29, 2019, 22:10:28
Is outputting to JackAudio a good choice? Too Linux-centric? Unnecessary for monolithic programs where I don't need an audio routing graph?

If you want your users to not be able to use your program, use Jack. More serious though, install 10 random Linux distributions, and see that Jack is neither configured nor even installed by default on any single one of them. Jack still does not work properly with PulseAudio on the same system, which further limits it applicability to standard Linux setups. It is only used on special installs or distributions geared towards audio production.

Quote from: nyanpasu64 on August 29, 2019, 22:10:28
I've tried OpenMPT on Wine a few months back, and the "alsa passthrough" was impossible to prevent from stuttering (I think PulseAudio was running), whereas Wine playback was smooth.

Well, do not use ALSA if you have PulseAudio running. It's that simple. ALSA will either use PulseAudio's emulation or fight with PulseAudio for a single device, or fight with PulseAudio sharing a single device.

All together, in the long-term, you will probably be better off with implementing multiple backends. The SoundDevice abstraction layer in OpenMPT is probably a good example of how to do that.
However, just to get started, that will be too much work. If you are using SDL to do your graphics output anyway, just stick to SDL for audio for now. Otherwise, cpal or PortAudio if you are developing primarily on Windows, or, if you do not care about other systems compatibility at the beginning, even just PulseAudio (Simple API) for now if you are developing on Linux.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on August 30, 2019, 10:40:10

I think that picking Rust has definitely made it harder to find audio libraries. RtAudio seems to support PulseAudio in addition to ALSA, but is C++ and has no Rust wrapper currently. However, Rust comes with easier/safer concurrency and match{}, "move by default", no need to edit header files separately, and no copy construction by default. And SDL is sufficient as a baseline (and the latency can't be worse than FamiTracker?).

On Windows, a 5ms sleep during each SDL callback causes intermittent gaps in the audio, even when I set each callback to supply 4096 samples (48000 smp/s). Maybe SDL calls the callback when there are <5ms * 48000smp/s = <240smp or 128smp left in the buffer? Or maybe at an arbitrary point? I assume the inability to tweak this parameter is what you mean by "SDL isn't configurable for low latency".

My GUI is currently built in GTK (because Qt is hard to use with Rust), not SDL.

What files does OpenMPT uses for its sequencer and synth? Other than that, there isn't much more for me to ask, and I should probably pick a design myself (or stick to SDL).

Surprisingly, OpenMPT WaveRT 2ms plays ~~smoothly~~ (never mind, I heard a pop) with "0%" CPU usage, on Realtek, Windows 10, and Microsoft drivers (not Realtek). But apparently it breaks other audio devices trying to play (or Audacity trying to record via WASAPI loopback).

Title: Re: How does OpenMPT's audio pipeline work?
Post by: manx on August 30, 2019, 11:24:13

Quote from: nyanpasu64 on August 30, 2019, 10:40:10
I think that picking Rust has definitely made it harder to find audio libraries. RtAudio seems to support PulseAudio in addition to ALSA, but is C++ and has no Rust wrapper currently. However, Rust comes with easier/safer concurrency and match{}, "move by default", no need to edit header files separately, and no copy construction by default.

I wont argue against Rust. It's a great language.

Quote from: nyanpasu64 on August 30, 2019, 10:40:10
And SDL is sufficient as a baseline (and the latency can't be worse than FamiTracker?).

On Windows, a 5ms sleep during each SDL callback causes intermittent gaps in the audio, even when I set each callback to supply 4096 samples (48000 smp/s). Maybe SDL calls the callback when there are <5ms * 48000smp/s = <240smp or 128smp left in the buffer? Or maybe at an arbitrary point? I assume the inability to tweak this parameter is what you mean by "SDL isn't configurable for low latency".

Last time I looked at its source, SDL's implementation was not really suitable for low latency for various reasons. However, you should not estimate actual real-world performance by simulating it with a sleep. Every operating system scheduler interprets an actual sleep as a "this thread is unimportant, not latency sensitive, and not high priority with regard to compute"-hint.

Quote from: nyanpasu64 on August 30, 2019, 10:40:10
My GUI is currently built in GTK (because Qt is hard to use with Rust), not SDL.

In that case, PortAudio or cpal should probably be better choices, even if they do not support PulseAudio natively. PortAudio generally does work ok-ish with the ALSA emulation of PulseAudio, albeit not with as good latencies as native PulseAudio is able to achieve.

Just to summarizes again, the goto audio backend APIs for initial development should be WASAPI on Windows, PulseAudio on Linux, CoreAudio on Mac, OSS on FreeBSD, I have no idea about Android (probably not your focus anyway though).
Other APIs are either outdated and nowadays emulated (WaveOut/MME, DirectSound, OpenAL on Windows; OpenAL, OSS on Linux), or low-level device-hogging APIs (WaveRT on Windows, ALSA on Linux), or special purpose APIs (ASIO for direct device access on Windows, Jack for audio session management and routing on Linux and Mac).

Quote from: nyanpasu64 on August 30, 2019, 10:40:10
What files does OpenMPT uses for its sequencer and synth? Other than that, there isn't much more for me to ask, and I should probably pick a design myself (or stick to SDL).

Most aspects are spread out over various files. Mainly Sndfile.* Snd_fx.* Sndmix.* pattern.* RowVisitor.* for the "sequencer" (pattern playback and effect interpretation), Mixer.* Resampler.*, IntMixer.* Sndmix.* Sndflt.* Tables.* WindowedFIR.* for the "synth" (sampler). Various other aspects (like plugin handling) are in even other files, and I might have also missed some files right now.

Quote from: nyanpasu64 on August 30, 2019, 10:40:10
Surprisingly, OpenMPT WaveRT 2ms plays ~~smoothly~~ (never mind, I heard a pop) with "0%" CPU usage, on Realtek, Windows 10, and Microsoft drivers (not Realtek). But apparently it breaks other audio devices trying to play (or Audacity trying to record via WASAPI loopback).

Yes, WaveRT bypasses all upper audio layers and is thus not generally useful in a desktop context.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on August 31, 2019, 09:07:35

What are the reasons that SDL is unsuitable for real-time audio?

Currently my synth can be called either in a separate thread (pushing data to a channel). Maybe I can make it also work as a generator/coroutine. (called by the audio callback. whenever it synthesizes enough data, it yields data to the audio thread and suspends its stack frame until the audio callback runs again.) I could synthesize 1 engine frame of audio at once, or incrementally. But I don't see how an easy way to run emulation logic over time, and the original FamiTracker updates rightwards channels a bit later than leftwards channels, to simulate the slow NES CPU updating channels in turn.

I've been looking into rust cpal, and it's worse to use than SDL. Its macro-laden design makes IDE autocompletion fail. Also the API picks a sample format, rate, and channel count for you at runtime, expecting your code to know how to render to any of u16, i16, or f32 (though you can override its choices at the risk of them being rejected, I picked i16 and 2 channels which should work everywhere). I think I cannot pick the size that the callback is expected to fill, nor how much buffering is done by cpal or the host API. Is this also unsuitable for real-time audio? What features should I expect from a good API?

I have some extra notes from their issue tracker at https://docs.google.com/document/d/149xFMivBZGAXRCEze1UUUhAKayucMPXFRehjYeSHj24/edit#heading=h.mbtlgdwouzoh which seem to indicate that cpal uses mutexes during audio processing at one point (supposedly fixed in master, but I can't find any commits from the author at the time he said so, and unsure if 0.10.0 has no mutexes), and picks a buffer size for you (10ms on WASAPI). Maybe not good signs.

Thanks for the OpenMPT file list, I'll look when I have time and the motivation to do it.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: manx on August 31, 2019, 09:58:13

Quote from: nyanpasu64 on August 31, 2019, 09:07:35
What are the reasons that SDL is unsuitable for real-time audio?

It ties the callback period (SDL_AudioSpec.samples) either to the buffer size (implying a simple double buffer scheme, which the implementation in SDL is not fit for when using a small callback buffer), or introducing an unknown amount of additional internal buffering. Combined with no API to determine the overall latencies, this IMHO makes it unfit for realtime use.

Quote from: nyanpasu64 on August 31, 2019, 09:07:35
Also the API picks a sample format, rate, and channel count for you at runtime, expecting your code to know how to render to any of u16, i16, or f32 (though you can override its choices at the risk of them being rejected, I picked i16 and 2 channels which should work everywhere).
I think I cannot pick the size that the callback is expected to fill, nor how much buffering is done by cpal or the host API. Is this also unsuitable for real-time audio? What features should I expect from a good API?
I have some extra notes from their issue tracker at https://docs.google.com/document/d/149xFMivBZGAXRCEze1UUUhAKayucMPXFRehjYeSHj24/edit#heading=h.mbtlgdwouzoh which seem to indicate that cpal uses mutexes during audio processing at one point (supposedly fixed in master, but I can't find any commits from the author at the time he said so, and unsure if 0.10.0 has no mutexes), and picks a buffer size for you (10ms on WASAPI). Maybe not good signs.

I have not looked at Rust cpal in detail, thus I am not really qualified to comment. However, if the points you mention are true, I would not want to use it.

Quote from: nyanpasu64 on August 31, 2019, 09:07:35
What features should I expect from a good API?

Difficult question ;).
First off, having worked with all kinds of different audio APIs in the past 20 years, I can say that I like no single one of them. They all have their own particular quirks or problems.
Second, every time an API tries to support both pull and push (or better refer to these variants as "callback" ("pull") vs. "synchronous" ("push"), as this makes the concept clearer when also considering recording) at the same time, they fail, and at least one of the variants is far from perfect. Some APIs emulate one on top of the other, however in my experience, one is far better off with implementing that emulation oneself. In particular, converting a synchronous API to a callback one is as simple as doing the synchronous calls in a separate thread. Converting the other way around is less simple as it involves implementing a properly synchronized buffer between the callback thread and the thread that calls synchronous functions. Both conversions induce additional implicit or explicit buffering (and thus latency) respectively.
A good audio API also provides a clearly defined way to determine timing information. Either by providing an instantaneous current output sample position (example MME/WaveOut), or precise amount of latency at a precisely defined timepoint (like callback begin or write position) (example: PulseAudio), or correlated timestamps of the sampleclock vs some systemclock (example: ASIO). PortAudio tries to do all three variants together, which mainly leads to confusion. SDL and DirectSound provide none, which is awful and requires the application to somehow guess, based on buffer sizes.
A good audio API also abstracts away sampleformat and samplerate and lets the application choose whatever it suits best and handles all conversion internally. The only exception to this rule should be low-level hardware APIs (like ASIO, WaveRT, ALSA) which need to give fine-grained control to the application so that it can configure the hardware exactly as needed. Low-level APIs should *NEVER* be the default for any application, as it breaks the "casual user"-usecase because low-level APIs tend to interfere with system audio for other applications.
In any kind of even halfway serious music production application, the audio rendering requires somewhat low latencies, which a GUI eventloop thread cannot provide (because it might be handling some GUI interaction/drawing). This implies having to do the rendering in a separate thread, in which case callback-based APIs are far more suitable than synchronous ones.

Having ruled out cpal and SDL, the solution for you is probably: Use PortAudio until it causes problems with PulseAudio on Linux for you.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on August 31, 2019, 12:45:16

>A good audio API also provides a clearly defined way to determine timing information.
>SDL and DirectSound provide none
Q: Does IDirectSoundBuffer::GetCurrentPosition() not count? 0CC-FamiTracker uses this function to monitor how full its chunked buffer is.

0CC is based around "synth thread renders entire engine frames at once, splits it into fixed-size chunks and writes to a queue or "chunked circular buffer", which is read by the audio output", and I approximately emulated that in my tracker. (I noticed my current SDL code behaves well at 11ms-ish latencies plus SDL buffering, whereas 0CC's audio output malfunctions at 20ms and below, possibly because DirectSound functions poorly on Vista+. Also my sound synthesis is just 2 white-noise generators, far simpler than 0CC's chip emulation.)

Also 0CC's latency slider is a lie. CDSound::OpenChannel() increments the latency 1ms at a time until the audio buffer can be divided evenly into 2 or more blocks. And no other code uses the latency.

(My code renders simulated "engine frames" of 800 samples which is 60/second at 48000Hz, sends them to a length-1 queue of 512 samples/frames, followed by SDL's internal buffering. My 11ms calculation may be wrong or too low, but may be wrong in the same way as 0CC is wrong.)

I think 0CC was (from above post) "implementing a properly synchronized buffer between the callback thread and the thread that calls synchronous functions" (and I copied that decision). I think it makes the synth code easier to read, more straightforward, and eliminates issues and edge cases around the "gap between callbacks". However, it was hard for me to discover what prevented the synth thread from running ahead (it was queue backpressure).

Q: Is this a reasonable arrangement, or does it introduce too much latency, or am I just perpetuating bad decisions and writing more and more code with those assumptions in mind?

(i swear i'll port my program to portaudio someday, but not today)

Q: Is cubeb good? It was suggested by someone on Discord saying "it may have been fixed" (but didn't specify the past issues), it supports Pulse, Firefox uses it, Dolphin uses it (>50ms of latency on Windows, but someone on Discord said that's good for Windows), and there's a Rust wrapper with active development but only 5 github stars.

Q: Is latency actually a problem in a tracker? I usually enter notes into 0CC when the tracker is paused and not playing. I've tried entering notes in real time, but ended up spending time fixing note placement afterwards. But I hear some people play MIDI keyboards while 0CC-famitracker is playing. Entering notes quickly and in time would require good piano skills (which i lack), or computer keyboard input skills (which I either lack or my ergonomic split keyboard makes it more difficult for me) or maybe the latency on my computers is too high. (Using 0CC on Wine requires latencies of 70ms or so, which some people would likely cringe at.)

(Should I keep making these posts, or are they annoying or too long or irrelevant?)

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on September 05, 2019, 06:57:48

bump... no reply?

Title: Re: How does OpenMPT's audio pipeline work?
Post by: Saga Musix on September 05, 2019, 17:52:14

I'll leave the first few questions to manx as he's more experienced in those regards.

QuoteIs latency actually a problem in a tracker? I usually enter notes into 0CC when the tracker is paused and not playing. I've tried entering notes in real time, but ended up spending time fixing note placement afterwards. But I hear some people play MIDI keyboards while 0CC-famitracker is playing. Entering notes quickly and in time would require good piano skills (which i lack), or computer keyboard input skills (which I either lack or my ergonomic split keyboard makes it more difficult for me) or maybe the latency on my computers is too high. (Using 0CC on Wine requires latencies of 70ms or so, which some people would likely cringe at.)

Latency is a problem whenever you want to do realtime recording or play along a song. The default options used to be a lot worse (In particular MME on Windows added a lot of latency which made these things impractical), these days it's a lot better with WASAPI on Windows, especially for casual use, but still not quite perfect. The lower the latency, the more precisely you can place recorded notes, and in particular if you are good at live playing, this also means that less notes need to be fixed afterwards. Depending on your audience this may or may not be relevant, but I for example do use low-latency (5ms) ASIO with OpenMPT because I can perceive this difference when recording compared to, say, 30ms latency with WASAPI. High latency can be quite confusing to the brain in this scenario. If you're not aiming for MIDI support, it's probably less relevant for you.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: manx on September 05, 2019, 18:28:21

Quote from: nyanpasu64 on August 31, 2019, 12:45:16
>A good audio API also provides a clearly defined way to determine timing information.
>SDL and DirectSound provide none
Q: Does IDirectSoundBuffer::GetCurrentPosition() not count? 0CC-FamiTracker uses this function to monitor how full its chunked buffer is.

No, IDirectSoundBuffer::GetCurrentPosition() is the most awkward interface ever invented to query the amount of buffer space that is writable for the application. It is also the only way to guess latency in DirectSound (and fails at that for various reasons, like for example because it does not (and can not by the design of its interface) represent additional latency by lower layers. It is unsuitable to tie a sample output position to the wallclock.

Quote from: nyanpasu64 on August 31, 2019, 12:45:16
0CC is based around "synth thread renders entire engine frames at once, splits it into fixed-size chunks and writes to a queue or "chunked circular buffer", which is read by the audio output", and I approximately emulated that in my tracker.

I'm confused about what this wants to tell me. In any case, it sounds overly complicated. Note that whatever structure you choose, if the amount of rendered PCM data by your synth is not directly influenced by the audio callback, but instead works in any kind of unrelated chunking, you will either introduce additional latency, and/or limit your maximum available CPU time to less than 100%. Also note that some synthesis algorithms imply internal chunking (e.g. a FFT) and thus by necessity add additional latency.

Quote from: nyanpasu64 on August 31, 2019, 12:45:16
(My code renders simulated "engine frames" of 800 samples which is 60/second at 48000Hz, sends them to a length-1 queue of 512 samples/frames, followed by SDL's internal buffering. My 11ms calculation may be wrong or too low, but may be wrong in the same way as 0CC is wrong.)
Q: Is this a reasonable arrangement, or does it introduce too much latency, or am I just perpetuating bad decisions and writing more and more code with those assumptions in mind?

Well, you render to a 800 sample frames buffer (1), submit that to a 512 sample frames buffering layer (2), send that in whatever chunk size SDL uses (let's assume 256 or 1024, just to make things more interesting) (3), which on Linux talks to PulseAudio, which in turn has its own internal buffering (4), which then sends the data to the soundcard via ALSA with its own ringbuffer (5).
So, 5 layers of buffering. At the *very* least, you should get rid of that 800-to-512 layering. It serves no purpose whatsoever. Unless you are required to process in chunks (i.e. because you are using a FFT or something like that) I highly suggest getting rid of any synth-internal chunking completely. And even if you are required to to internal chunking, abstract it away at the interface level (of the synth), i.e. by introducing *internal* (internal to the synth) buffering and exposing its latency.

Quote from: nyanpasu64 on August 31, 2019, 12:45:16
Q: Is cubeb good? It was suggested by someone on Discord saying "it may have been fixed" (but didn't specify the past issues), it supports Pulse, Firefox uses it, Dolphin uses it (>50ms of latency on Windows, but someone on Discord said that's good for Windows), and there's a Rust wrapper with active development but only 5 github stars.

Never used it. At least it should be very compatible with various system setups, as Firefox relies on it. 50ms seems weird on Windows, WASAPI should trivially provide 20..30ms. 50ms is to be expected with MME/WaveOut however. cubeb supports both.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on September 08, 2019, 06:30:50

Quotesubmit that to a 512 sample frames buffering layer (2), send that in whatever chunk size SDL uses (let's assume 256 or 1024, just to make things more interesting) (3)...

My buffering layer always uses the same chunk size as the audio callback.

I switched to PortAudio, which I had to manually build on Windows, and place the .lib an undocumented path, but works fine. Problem is, it supports nothing but MME! Do I need the DirectX headers to get WASAPI?
I can specify how many samples to be generated by each callback, but not how many buffers are used (double-buffer or more?). And if I ask for 1 second of audio and sleep half a second, there's no stuttering at all (unlike SDL).
Q: Is PortAudio an acceptable low-latency API if I can't control how it buffers audio? Does it always double-buffer and is that good enough?
Q: Is it normal that PortAudio works fine on Windows with non-power-of-2 buffer sizes (accidentally set buffer size, not sampling rate, to 48000)?
Q: Should I file an feature request in CPAL for an API to configure "how many samples generated per callback"? The maintainer of rust-portaudio has moved onto CPAL (and you say its API is awful) and stopped working on PortAudio (https://github.com/RustAudio/rust-portaudio/issues/177).

Cubeb's API seems unstable. The current example code in cubeb-rs crashes on Windows with a COM-related error (https://github.com/djg/cubeb-rs/issues/45) due to upstream cubeb changes which are still in flux (I think cubeb is a firefox library with no stable release cycle, and cubeb-rs just imports master as a submodule). Fixing the code requires me to add Windows-specific code (or maybe revert to older cubeb-rs/cubeb that doesn't make the user manage COM threading.)

Quote50ms seems weird on Windows, WASAPI should trivially provide 20..30ms.

https://dolphin-emu.org/blog/2017/06/03/dolphin-progress-report-may-2017/#50-3937-add-cubeb-audio-backend-by-ligfx (https://dolphin-emu.org/blog/2017/06/03/dolphin-progress-report-may-2017/#50-3937-add-cubeb-audio-backend-by-ligfx) claims that XAudio2 has 62-68ms of latency (possibly some from Dolphin).

Latency, buffering, and NES hardware

Quoteif the amount of rendered PCM data by your synth is not directly influenced by the audio callback, but instead works in any kind of unrelated chunking, you will either introduce additional latency, and/or limit your maximum available CPU time to less than 100%. And even if you are required to to internal chunking, abstract it away at the interface level (of the synth), i.e. by introducing *internal* (internal to the synth) buffering and exposing its latency.
...you should get rid of that 800-to-512 layering. It serves no purpose whatsoever.

I'm ripping out the queue soon. I think that even a 0-length queue introduces latency, by allowing the synth to run ahead of the callback up to 1 chunk of audio (the synth blocks trying to push to the queue, until the callback tries to pull from the queue).

In the NES, all audio chips run in lockstep off the master clock, which also controls vblank. Most audio engines including Famitracker only run once per vblank (though Famitracker/NSF allows the engine to be called at a custom rate). I think there's nothing wrong with synthesizing new audio once per engine frame. Even if I were to render audio more finely, I wouldn't get any latency advantages (since all inputs must be quantized to 1/60 of a second), I think.

My idea is for the callback's "persistent object" to own the synth, and I only synthesize audio within the callback. Whenever the synth is out of audio, the callback runs the synth for 1 (or more?) frames into a buffer until I have 1 or more chunks of audio. Then each subsequent callback will pull audio out of the buffer, until it's empty.

However this will result in some callbacks running engine logic and synthesis (high CPU usage), while some don't (minimal CPU usage). Q: Does OpenMPT also behave that way, but maintain low latency anyway? (I haven't read OpenMPT's code yet since I was busy with classes and other tracker research, should I read it?) (I assume with a 5ms period or block size, the synth function can take up to 5ms to complete without stuttering.)

Alternatively, the callback could run engine logic once per vblank, but synthesize audio on demand: look at the buffer size, ask the library "how many clock cycles should I advance CPU time so that X samples of audio are available?", and run all sound chips (and possibly the engine) for that many cycles. Q: How much will this spread out CPU usage between callbacks? Should I run a profiler on FamiTracker and see where most CPU is being spent (probably drawing the GUI, not running audio)?

Q: How is latency computed? Are there any techniques for this, like concurrency/timing diagrams on paper? Note that user inputs can happen at any time within a NES frame (the timing granularity of NES sound engines tied to vblank), leading to an inherent 16ms of variance.

I can assign inputs to either the next NES frame, or the previous NES frame to hide latency. It's possible to "insert note into pattern" as if the key was pressed earlier, but I can't retroactively play audio as if the key was pressed earlier. And I have no clue how famitracker combines "when playing a pattern, use whatever channel the note is located in" and "in edit mode, add user input to the pattern and play in current channel only" and "in read-only mode, shove newly played notes into the cursor channel, but look in the next channel modulo N if this one's occupied, and steal channels too".

NES Audio Synthesis

All NES audio (except for a FM chip called VRC7 used in 1 game) is made of a series of flat lines separated by steps (though the FDS has extra audio filtering after the steps). Famitracker uses the blip_buffer library (by blargg) for all chips (except FM) to generate audio out of bandlimited steps (positioned at CPU clocks), and I think that's a reasonable design to keep.

Unfortunately blip_buffer is (unnecessarily) heavily templated in C++, making it hard to wrap in Rust, and the audio processing is incomprehensible. I'm planning to use the blip_buf library (also by blargg), written in C and having with a Rust wrapper. Incidentally, "how many clocks do I need" is bugged in blip_buf, for 4096 or more audio samples. I can fix by vendoring the dependency and patching the C. (I removed some discussion related to synthesis and not latency.)

Title: Re: Re: How does OpenMPT's audio pipeline work?
Post by: manx on September 08, 2019, 07:28:25

Quote from: nyanpasu64 on September 08, 2019, 06:30:50
I switched to PortAudio, which I had to manually build on Windows, and place the .lib an undocumented path, but works fine. Problem is, it supports nothing but MME! Do I need the DirectX headers to get WASAPI?

PortAudio supports MME, WASAPI, DirectSound on Windows (and ASIO, which you should not care about). All required headers come with any supported Windows SDK. I have no idea what went wrong for your setup.

Quote from: nyanpasu64 on September 08, 2019, 06:30:50
I can specify how many samples to be generated by each callback, but not how many buffers are used (double-buffer or more?). And if I ask for 1 second of audio and sleep half a second, there's no stuttering at all (unlike SDL).
Q: Is PortAudio an acceptable low-latency API if I can't control how it buffers audio? Does it always double-buffer and is that good enough?

You can. PAStreamParameters::suggestedLatency.

Quote from: nyanpasu64 on September 08, 2019, 06:30:50
Q: Is it normal that PortAudio works fine on Windows with non-power-of-2 buffer sizes (accidentally set buffer size, not sampling rate, to 48000)?

Sure, there is no reason whatsoever why any API should even require power-of-2 buffer sizes. That's a totally arbitrary limitation.

Quote from: nyanpasu64 on September 08, 2019, 06:30:50
Quote50ms seems weird on Windows, WASAPI should trivially provide 20..30ms.
https://dolphin-emu.org/blog/2017/06/03/dolphin-progress-report-may-2017/#50-3937-add-cubeb-audio-backend-by-ligfx (https://dolphin-emu.org/blog/2017/06/03/dolphin-progress-report-may-2017/#50-3937-add-cubeb-audio-backend-by-ligfx) claims that XAudio2 has 62-68ms of latency (possibly some from Dolphin).

XAudio2 is yet another audio API which we have not talked about yet. You probably should not care. It's a higher level API on top of WASAPI, and also available on XBox. Not sure what contributes to those given latency numbers. WASAPI for sure works completely fine with 20ms..30ms latency.

Quote from: nyanpasu64 on September 08, 2019, 06:30:50
Quoteif the amount of rendered PCM data by your synth is not directly influenced by the audio callback, but instead works in any kind of unrelated chunking, you will either introduce additional latency, and/or limit your maximum available CPU time to less than 100%. And even if you are required to to internal chunking, abstract it away at the interface level (of the synth), i.e. by introducing *internal* (internal to the synth) buffering and exposing its latency.
...you should get rid of that 800-to-512 layering. It serves no purpose whatsoever.
I'm ripping out the queue soon. I think that even a 0-length queue introduces latency, by allowing the synth to run ahead of the callback up to 1 chunk of audio (the synth blocks trying to push to the queue, until the callback tries to pull from the queue).

Yes, even a "0-length-queue" introduces latency, implicitly, because you calculate your chunk beforehand.

Quote from: nyanpasu64 on September 08, 2019, 06:30:50
In the NES, all audio chips run in lockstep off the master clock, which also controls vblank. Most audio engines including Famitracker only run once per vblank (though Famitracker/NSF allows the engine to be called at a custom rate). I think there's nothing wrong with synthesizing new audio once per engine frame. Even if I were to render audio more finely, I wouldn't get any latency advantages (since all inputs must be quantized to 1/60 of a second), I think.
My idea is for the callback's "persistent object" to own the synth, and I only synthesize audio within the callback. Whenever the synth is out of audio, the callback runs the synth for 1 (or more?) frames into a buffer until I have 1 or more chunks of audio. Then each subsequent callback will pull audio out of the buffer, until it's empty.
However this will result in some callbacks running engine logic and synthesis (high CPU usage), while some don't (minimal CPU usage).
Q: Does OpenMPT also behave that way, but maintain low latency anyway? (I haven't read OpenMPT's code yet since I was busy with classes and other tracker research, should I read it?) (I assume with a 5ms period or block size, the synth function can take up to 5ms to complete without stuttering.)
Alternatively, the callback could run engine logic once per vblank, but synthesize audio on demand: look at the buffer size, ask the library "how many clock cycles should I advance CPU time so that X samples of audio are available?", and run all sound chips (and possibly the engine) for that many cycles.
Q: How much will this spread out CPU usage between callbacks? Should I run a profiler on FamiTracker and see where most CPU is being spent (probably drawing the GUI, not running audio)?

OpenMPT also *wants* to render audio chunks of a given length (the tick duration), which however can change during playback. However it doesnt. It renderes precisely as much audio as is requested by the callback, and remembers how much audio is yet to be rendered to complete the current tick. Input processing also only happens on tick boundaries as necessary. Compared to generating the actual audio, tick processing has negligible CPU requirements, which result in almost constant CPU requirement per callback (which is good). Pre-rendering a complete tick would introduce a complete tick worth of additional latency.

Quote from: nyanpasu64 on September 08, 2019, 06:30:50
Q: How is latency computed? Are there any techniques for this, like concurrency/timing diagrams on paper? Note that user inputs can happen at any time within a NES frame (the timing granularity of NES sound engines tied to vblank), leading to an inherent 16ms of variance.

OutputLatency = worst-case sum of all output buffering.
InputLatency = processing chunk size
RoundtripLatency = InputLatency + OutputLatency
Yes, diagrams do help.

Quote from: nyanpasu64 on September 08, 2019, 06:30:50
I can assign inputs to either the next NES frame, or the previous NES frame to hide latency. It's possible to "insert note into pattern" as if the key was pressed earlier, but I can't retroactively play audio as if the key was pressed earlier.

That's precisely why all output buffering (which constitutes audio that has already been rendered) contributes to output latency. If you need to react to input faster, reduce the latency.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on September 08, 2019, 11:28:23

QuotePortAudio supports MME, WASAPI, DirectSound on Windows (and ASIO, which you should not care about). All required headers come with any supported Windows SDK. I have no idea what went wrong for your setup.

I accidentally printed the wrong variable (default MME twice). 🤦‍♀️ Actually, portaudio-rs only picked up MME and WDM-KS. I built PortAudio in CLion using CMake and MSVC (2019?) x64. And WASAPI was explicitly disabled when running cmake, because the author couldn't get it to build (even though it builds fine for me). They also ship a project in some ancient Visual Studio format which I could try building instead.

QuoteCompared to generating the actual audio, tick processing has negligible CPU requirements, which result in almost constant CPU requirement per callback (which is good).

Good to know, thanks!

QuotePre-rendering a complete tick would introduce a complete tick worth of additional latency.

Famitracker encodes instrument volumes as [tick]volume-level. Assume that I receive a new note halfway into a tick, and need to play a preview of that note. If I were to match Famitracker behavior, the actual sound driver (both NSF and software player) only runs once a tick, so I'd have to wait until the tick ends before triggering the new note. I could deviate from "behavior when playing a pattern" and trigger a new note right away, where its initial volume would be volumes[0]. Half a tick later when the engine actually runs, does it stay at volumes[0] or switch to volumes[1]? It's probably doable, but is it a good idea? Does OpenMPT preview audio immediately when keys are pressed, even in the middle of a tick?

I'm probably going to "run engine logic once per vblank, but synthesize audio on demand" at some point. I'll look into PAStreamParameters::suggestedLatency later. But I'll first build a prototype of my new note-placement system (which differs from other trackers), before working more on audio.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: Saga Musix on September 08, 2019, 13:52:36

Quote from: nyanpasu64 on September 08, 2019, 11:28:23Does OpenMPT preview audio immediately when keys are pressed, even in the middle of a tick?

OpenMPT immediately allocates a channel and fills it with the required information, but the list of active voices (which is probably not something you have in the NSF scenario) that is used by the mixer is not updated to contain this new channel until the next tick is processed. This is mostly for simplicity reasons because in reality a bit more has to be done than just inserting the channel into that list, and the code that does this "a bit more" stuff is not meant to be run more than once per tick (because it does all sorts of updates to all active channels).
As a result, if you have very long ticks you will not hear the previewed note instantly, but given that this is a rather unlikely scenario to happen, this architectural trade-off is probably not too bad.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on September 16, 2019, 00:13:04

- I tried building OpenMPT. I had to upgrade the Windows 10 SDK to 17763.
- Also I disabled Spectre mitigations using sed. It's extra work to install Spectre-mitigated libraries. Also, I tried and failed to install Spectre-mitigated libraries and MFC, as (if I recall) Spectre-mitigated MFC didn't exist for the latest SDK or compiler I was using. I can't imagine that OpenMPT could be a useful target (or attacker?) for Spectre attacks.

Also how does OpenMPT allow entering notes into patterns, while the pattern is being read by the sequencer/synth? Does it use locks to ensure only the audio thread is reading, or UI is reading or writing? I'm reading Sndmix.cpp now.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: Saga Musix on September 16, 2019, 11:47:44

Quote from: nyanpasu64 on September 16, 2019, 00:13:04
- I tried building OpenMPT. I had to upgrade the Windows 10 SDK to 17763.

I suppose you mean you had to update the SDK in the project file? Yes, this is a bit messy because I think you cannot just tell MSVC to use any Windows 10 SDK available. It's simpler to build the Windows 7 variant of OpenMPT as there is no SDK version ambiguity in that case.

Quote from: nyanpasu64 on September 16, 2019, 00:13:04
- Also I disabled Spectre mitigations using sed. It's extra work to install Spectre-mitigated libraries. Also, I tried and failed to install Spectre-mitigated libraries and MFC, as (if I recall) Spectre-mitigated MFC didn't exist for the latest SDK or compiler I was using. I can't imagine that OpenMPT could be a useful target (or attacker?) for Spectre attacks.

While it might not be a very realistic target, it makes sense deploying Spectre mitigations in all software, plus libopenmpt may be used in contexts where Spectre mitigation does matter - we have no control over that.

Quote from: nyanpasu64 on September 16, 2019, 00:13:04Also how does OpenMPT allow entering notes into patterns, while the pattern is being read by the sequencer/synth? Does it use locks to ensure only the audio thread is reading, or UI is reading or writing? I'm reading Sndmix.cpp now.

There is a critical section (mutex) around the audio rendering (see CMainFrame::SoundSourceLock / CMainFrame::SoundSourceUnlock) and any editing actions that may modify the CSoundFile object in a that touches any internal pointers (e.g. moving / deleting child objects such as instruments). Editing simple attributes such as sample volume or similar does not require a mutex.

Note that depending on the data access scheme (i.e. if a lot of concurrent reads are expected from more than one thread, but only few writes), a shared mutex with the option of exclusive locking may be more efficient. This means that concurrent reads won't have to wait for each other, they would just have to wait if some write operation locks the mutex exclusively. OpenMPT's CSoundFile lock may move into that direction in the future in particular due to the planned scripting API.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on September 23, 2019, 07:34:04

I may be dropping Rust and using C++ instead. Rust and Qt do not interact well, and I don't know if I can have a hybrid Rust-C++ codebase and pass around BTreeMap or std::map (a critical component of my new pattern note storage format) between Rust and C++. Might be possible, but I assume template/generic rules it out.

Sidenote: I haven't been working as actively on the tracker recently. These last few days, I've been struggling with C++ CMake Windows development. OpenMPT bundles PortAudio but hard-codes it to assume Windows, which I think is unacceptable. Alternatively, I can have the user install PortAudio systemwide (which is the norm on Linux) and I pick it up via cmake... But I want to avoid hard-coded PATH for each library I add... vcpkg is a "compiled library" manager which installs multiple libraries into a single tree, so I only need one PATH entry. vcpkg has a mechanism so I can configure that PATH entry on each individual machine. Unfortunately vcpkg's portaudio is broken and doesn't install a find_packages(portaudio) target (I may report a vcpkg bug). In CLion, I tried to add Qt to PATH, but I touched the config wrong and now it replaces PATH (if I edit the text box) instead of appending/prepending (if I edit via the dialog). (I fixed this issue for now, but I may report a CLion bug.)

Q: RtAudio (C++, not C, maybe no bindings) has no package in vcpkg, but supports PulseAudio natively. Is it a good library to use, if I'm not using Rust anymore?

I decided to write another post here because Four common mistakes in audio development (http://atastypixel.com/blog/four-common-mistakes-in-audio-development/) (unfortunately Apple-centric) popped up on Hacker News (https://news.ycombinator.com/item?id=21043113), and claims that acquiring locks on the audio thread is bad, because if the GUI thread acquires the lock (even briefly) but the OS scheduler suspends the GUI thread, the audio thread cannot run until the GUI thread releases the lock.

I looked into 0CC because I'm partially familiar with its codebase (I don't know OpenMPT's at all). It has multiple CCriticalSection locks acquired in the audio path: m_InstrumentLock for editing instruments, m_csAPULock for the sound synth (only contended when the user reconfigures audio settings or resets the APU), some GUI-only locks, and...
Most notably, for each channel (CTrackerChannel), m_csNoteLock guards reading/writing pattern row-events (stChanNote) containing note, volume, and effects. This means that if you type notes into the tracker (so the GUI thread acquires the lock) and the OS scheduler suspends the GUI thread, the synth thread cannot access the contents of that channel. Personally I never "enter notes while the tracker is running", but if some people play MIDI instruments while the tracker is running, copying 0CC's design may (in theory) cause stuttering for them. Q: Does this seem like a design flaw? I don't know if it's a real or only theoretical problem. I've never seen note entry cause stuttering in 0cc, but 0cc requires latencies of 30+ ms to even operate properly.

Q: In OpenMPT, does the GUI thread only acquire the lock when editing instruments? Does the GUI not acquire the lock when the user enters notes into the pattern? If not, how do you atomically mutate multiple fields (note, instrument, volume) at the same time?

CPortaudioDevice::StreamCallback() is called when I begin playback or enter a note into a module. It creates SourceLockedGuard which calls SoundSourceLock()... Since both initiating playback and entering notes causes CPortaudioDevice (not the GUI) to lock, I'm guessing that SoundSourceLock is not responsible for protecting the pattern from being read and mutated at the same time. Q: Which lock/mutex is acquired by "any editing actions that may modify the CSoundFile object in a that touches any internal pointers"?

Q: Should I use a persistent data structure (common in Clojure) to store pattern data? (Persistent data structures are immutable, but I can perform a copy-with-mutation which reuses existing RAM for most unedited fields.) This way, the audio thread can atomically get a pointer to the entire module data. When the user enters notes, the main thread can create a mutated copy of the entire module data, and atomically replace the pointer.

I actually came up with this idea months ago for implementing an undo system, where I just keep around "old versions of the entire module data", which share the same storage for unmodified parts of the module, leading to less memory overhead than keeping around "copies of the entire module". I scrapped this idea and decided that a "transaction system where I save the before/after state and mutate the module in-place" was a simpler approach. However, the GUI and synth threads must lock the module, to stop the synth thread from running halfway through the user-input thread, or the user-input thread running halfway through the synth thread, etc. (Even with a concurrent hashmap, will I still get exotic forms of execution interleaving?)

One thing that immutable data structures won't help with... The synth thread has a "current playback pointer". If I edit pattern lengths or add/remove patterns while the synth thread is playing, this could create interesting failure modes. 0CC handles them fine, but I'd have to make sure my program doesn't misbehave.

https://stackoverflow.com/q/4394399/ I found via search
https://sinusoid.es/immer/ is a C++ library implementing persistent data structures
https://clojure.org/about/state#_clojure_programming describes the conceptual model of "immutable values" and "identities which can point to different structures at different times" (I'm not a Clojure user, I'm not sure if I'd use Agent or Ref. Clojure uses MVCC, does it have latency? Clojure uses JVM. 😂 In Rust, this might be crossbeam::atomic::AtomicCell<Arc<PersistentThingy>>?)

Q: What's the difficulty and maintenance burden if I or an existing dev were to take OpenMPT and add a mode where, instead of placing notes and effects on a fixed pattern grid (fixed row duration), rows/events can be placed at any rational fraction of a "beat" (quarter note)? Patterns would not store arrays of row events, but instead sorted std::map<fraction, row event>.

Optionally use a struct{fraction; signed int offset;} to allow placing multiple events at the same fraction, but delayed/early by different numbers of ticks. Might be useful, might be a worse version of a "release note early before next note" effect.

The "row duration fraction" (beats/row) is a property of the pattern editor (not the pattern), and controls what timestamps I can add, edit, or remove events. This will be useful as a non-hack approach to triplets and mixed rhythms, and for adding fine detail to a song without changing the layout of all existing notes in that pattern. Decreasing row duration will increase rows/beat, and if row height (px/row) is unchanged, will make each beat look taller on-screen.

I feel it would be difficult to add this functionality into 0CC (though I don't understand 0CC fully). If implemented, this may reduce or eliminate the need for me to write a tracker from scratch.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: Saga Musix on September 26, 2019, 15:12:30

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
OpenMPT bundles PortAudio but hard-codes it to assume Windows, which I think is unacceptable.

Which part exactly of that do you consider to be unacceptable, and why? Since libopenmpt does not use PortAudio, there is currently little sense (read: it would waste time better spent on other issues) in configuring PortAudio in a more flexible way, and we bundle our own version since for two reasons:
- It contains modifications in particular in its Windows implementation.
- We want OpenMPT development to be simple without having to clone and configure dozens of dependencies on Windows.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
Q: RtAudio (C++, not C, maybe no bindings) has no package in vcpkg, but supports PulseAudio natively. Is it a good library to use, if I'm not using Rust anymore?

I haven't really used it but given that it supports WASAPI these days (IIRC it didn't back when PortAudio was first implemented in OpenMPT) it should at least be a safe choice on Windows.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
I decided to write another post here because Four common mistakes in audio development (http://atastypixel.com/blog/four-common-mistakes-in-audio-development/) (unfortunately Apple-centric) popped up on Hacker News (https://news.ycombinator.com/item?id=21043113), and claims that acquiring locks on the audio thread is bad, because if the GUI thread acquires the lock (even briefly) but the OS scheduler suspends the GUI thread, the audio thread cannot run until the GUI thread releases the lock.

That is correct, hence my remark on the shared mutex in my previous post. In practice, OpenMPT locks the mutex for a very short period though, and mutex implementations are very fast these days, so it has never really been an issue in practice. I think this should also answer your question regarding live note entering.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04Q: In OpenMPT, does the GUI thread only acquire the lock when editing instruments? Does the GUI not acquire the lock when the user enters notes into the pattern? If not, how do you atomically mutate multiple fields (note, instrument, volume) at the same time?

As mentioned before, OpenMPT mostly only locks the mutex when it is really required (e.g. when modifying pointers). There is no need for atomic mutation of multiple pattern fields at the same time. While it may be possible that some pattern edit commands do that, in practice it won't matter if it's done atomically or not. Note that there is also no lock or specific atomic operation required to update properly-aligned integers on x86, so it's not like the pattern or instrument data would be potentially full of garbage while edits are written to memory.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
CPortaudioDevice::StreamCallback() is called when I begin playback or enter a note into a module. It creates SourceLockedGuard which calls SoundSourceLock()... Since both initiating playback and entering notes causes CPortaudioDevice (not the GUI) to lock, I'm guessing that SoundSourceLock is not responsible for protecting the pattern from being read and mutated at the same time. Q: Which lock/mutex is acquired by "any editing actions that may modify the CSoundFile object in a that touches any internal pointers"?

Also as said in my previous post, yes, SoundSourceLock is responsible for locking the data. SourceLockedGuard uses the RAII pattern (https://en.wikipedia.org/wiki/RAII) to automatically unlock the CriticalSection once it goes out of scope (when its destructor is invoked automatically). SourceLockedGuard does not hold the lock indefinitely.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04Q: Should I use a persistent data structure (common in Clojure) to store pattern data? (Persistent data structures are immutable, but I can perform a copy-with-mutation which reuses existing RAM for most unedited fields.) This way, the audio thread can atomically get a pointer to the entire module data. When the user enters notes, the main thread can create a mutated copy of the entire module data, and atomically replace the pointer.

I am sure there are both reasons for and against using it but I haven't spend any thought on it so far so I'll leave it up to you to find the answer. I doubt it's feasible for all module data (e.g. samples) though.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
Q: What's the difficulty and maintenance burden if I or an existing dev were to take OpenMPT and add a mode where, instead of placing notes and effects on a fixed pattern grid (fixed row duration), rows/events can be placed at any rational fraction of a "beat" (quarter note)? Patterns would not store arrays of row events, but instead sorted std::map<fraction, row event>.

I cannot give you an estimate but I know that the maintenance burden (which would be entirely on me) would be high enough that I would not accept such a contribution. I know there are lots of deficiencies in the current system for more modern and easy music production but having your system cooperate with anything OpenMPT is doing at the moment would be way too much work. Some new ideas are best left to be implemented in new software.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on October 07, 2019, 00:35:39

Quote from: Saga Musix on September 26, 2019, 15:12:30
Quote from: nyanpasu64 on September 23, 2019, 07:34:04
OpenMPT bundles PortAudio but hard-codes it to assume Windows, which I think is unacceptable.
Which part exactly of that do you consider to be unacceptable, and why? Since libopenmpt does not use PortAudio, there is currently little sense (read: it would waste time better spent on other issues) in configuring PortAudio in a more flexible way, and we bundle our own version since for two reasons:
- It contains modifications in particular in its Windows implementation.
- We want OpenMPT development to be simple without having to clone and configure dozens of dependencies on Windows.

I didn't mean that it was a problem for OpenMPT. But my tracker is planned to be multi-platform, so I can't do the same thing and apply the same Windows-only patch.

I ended up bundling portaudio without modifications, writing a CMakeLists.txt for the portaudiocpp bindings, and porting my program to portaudiocpp. maybe i could switch to rtaudio... that'll take hours/days to get working though...

Q: Is portaudiocpp a good library to use, or should I learn portaudio's C API? portaudiocpp seems less "guaranteed safe via lifetimes and RAII" than Rust portaudio bindings.

Quote from: Saga Musix on September 26, 2019, 15:12:30I haven't really used [RtAudio] but given that it supports WASAPI these days (IIRC it didn't back when PortAudio was first implemented in OpenMPT) it should at least be a safe choice on Windows.

Sidenote: BambooTracker just switched from running audio in Qt's event loop (if GUI lags, audio stops running) to RtAudio.

Quote from: Saga Musix on September 26, 2019, 15:12:30
As mentioned before, OpenMPT mostly only locks the mutex when it is really required (e.g. when modifying pointers). There is no need for atomic mutation of multiple pattern fields at the same time. While it may be possible that some pattern edit commands do that, in practice it won't matter if it's done atomically or not. Note that there is also no lock or specific atomic operation required to update properly-aligned integers on x86, so it's not like the pattern or instrument data would be potentially full of garbage while edits are written to memory.

Maybe this is OK? Microsoft (https://docs.microsoft.com/en-us/windows/win32/dxtecharts/lockless-programming?redirectedfrom=MSDN#non-atomic-operations) claims that x86 processors have atomic integer writes, and MSVC promises to not miscompile them. And https://preshing.com/20130618/atomic-vs-non-atomic-operations/ "In the games industry, I can tell you that a lot of 32-bit integer assignments rely [that plain 32-bit integer assignment is atomic as long as the target variable is naturally aligned.]"

Maybe this is not OK? According to https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf , data races are illegal in C11 and C++11, and can be miscompiled to malfunctioning asm on the compiler's whim (even if the CPU has atomic reads/writes). (Ironically this article is written by Hans-J. Boehm, the creator of the Boehm GC which is a pile of undefined-behavior hacks which in practice often generates correct machine code on some compilers.)

For my program, "writing into an array" isn't a powerful enough operation, since I store events in a "sorted map/list of timestamped events", and allocation/shuffling is needed if I append/delete to the end, or (worse yet) in the middle. I'm going down the "persistent data structures" path and I'll see how well it turns out.

My earlier idea for the synth was computing where vblanks occur before the callback ends, in an imperative manner.

0CC's player simulates the time taken for the driver to process each channel. It updates each channel 150-200 clock cycles after the previous one (but I think it writes all registers of each channel simultaneously?). In 0CC, this is fairly straightforward since the synth runs in a separate thread and generates 1 tick of audio at a time (though this creates latency).

Q: Is this behavior worth adding to my new tracker? (or maybe i shouldn't be asking OpenMPT devs about this) This will add enough complexity that I'd rather write a "priority-queue scheduling system" with tick, update-next-channel, and end-callback events. (I actually find this idea more elegant. It'll still be bounded-runtime, see my Google Doc with notes (https://docs.google.com/document/d/17g5wqgpUPWItvHCY-0eCaqZSNdVKwouYoGbu-RTAjfo/edit?usp=sharing).)

Title: Re: How does OpenMPT's audio pipeline work?
Post by: Saga Musix on October 08, 2019, 15:16:24

Quote from: nyanpasu64 on October 07, 2019, 00:35:39
Q: Is portaudiocpp a good library to use, or should I learn portaudio's C API? portaudiocpp seems less "guaranteed safe via lifetimes and RAII" than Rust portaudio bindings.

A general answer: Even if not perfect, I'd usually prefer a C++ API over a C API when writing C++, unless there are very good reasons not to use it. Even if it does not support RAII by itself, you can still build your own RAII patterns on top, which is what OpenMPT does for various C libraries that it uses.

Quote from: nyanpasu64 on October 07, 2019, 00:35:39
Sidenote: BambooTracker just switched from running audio in Qt's event loop (if GUI lags, audio stops running) to RtAudio.

Qt sadly isn't really good at playing audio, as soon as you want to do anything slightly more complex than playing simple sound effects you either have to write very complex code, or you cannot do it at all.

Quote from: nyanpasu64 on October 07, 2019, 00:35:39Maybe this is not OK? According to https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf , data races are illegal in C11 and C++11, and can be miscompiled to malfunctioning asm on the compiler's whim (even if the CPU has atomic reads/writes). (Ironically this article is written by Hans-J. Boehm, the creator of the Boehm GC which is a pile of undefined-behavior hacks which in practice often generates correct machine code on some compilers.)

The point is that on x86 in particular there is no reason for a compiler to miscompile this kind of code - it would make the generated code more complex than the version that the platform guarantees to be safe.

In your specific case you may either have to use locks or look into lock-free data structures, but be warned that lock-free data structures are a very complex topic and often not worth the effort.

Quote from: nyanpasu64 on October 07, 2019, 00:35:39My earlier idea for the synth was computing where vblanks occur before the callback ends, in an imperative manner.

Obviously this won't work outside of the console or emulated environment. Apart from the fact that there is typically no easy way of accessing this information on modern platforms, the user can set the refresh rate to anything they want (or their device supports), and then of course there is stuff like FreeSync these days which abandonds the concept of a constant refresh rate. On an oldskool console like NES it makes a lot more sense of course.

Quote from: nyanpasu64 on October 07, 2019, 00:35:39Q: Is this behavior worth adding to my new tracker?

This sounds like something very specific to that platform so I don't have any input on that.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on October 10, 2019, 01:41:37

Thanks for the help.

Quote from: Saga Musix on October 08, 2019, 15:16:24
A general answer: Even if not perfect, I'd usually prefer a C++ API over a C API when writing C++, unless there are very good reasons not to use it. Even if it does not support RAII by itself, you can still build your own RAII patterns on top, which is what OpenMPT does for various C libraries that it uses.

Forgot to mention, portaudiocpp takes portaudio return codes and converts them into C++ exceptions. If I don't religiously add catch-all clauses, can this cause audio errors (like unplugging an audio output) to bring down my entire audio thread or program? (Apparently std::terminate() kills the thread, not process.) Is this a reason to avoid portaudiocpp?

QuoteObviously this won't work outside of the console or emulated environment. Apart from the fact that there is typically no easy way of accessing this information on modern platforms, the user can set the refresh rate to anything they want (or their device supports), and then of course there is stuff like FreeSync these days which abandonds the concept of a constant refresh rate. On an oldskool console like NES it makes a lot more sense of course.

I said vblank, but I meant tracker ticks. FamiTracker (and my new tracker) are designed to function similarly in the C++ emulation, and the 6502 asm driver (for NSF file export). As a result, tracker ticks are usually aligned with vblanks (NTSC/PAL), which the C++ code emulates based on "how many cycles were run by the audio thread". But FamiTracker and NSF files have an option to customize the tick rate. (Fun fact, one particular C++ synthesizer had a bug which caused vblank to be delayed every time you switched instruments.)

Title: Re: How does OpenMPT's audio pipeline work?
Post by: Saga Musix on October 12, 2019, 13:07:01

Quote from: nyanpasu64 on October 10, 2019, 01:41:37
Forgot to mention, portaudiocpp takes portaudio return codes and converts them into C++ exceptions. If I don't religiously add catch-all clauses, can this cause audio errors (like unplugging an audio output) to bring down my entire audio thread or program? (Apparently std::terminate() kills the thread, not process.) Is this a reason to avoid portaudiocpp?

Generally if a function is documented to throw something you should of course expect to catch it. I don't know if this was a reason for choosing the PortAudio C API in OpenMPT, but in general it should be said that exception handling in C++ is much more efficient in modern compilers than it used to be.

Quote from: nyanpasu64 on October 10, 2019, 01:41:37I said vblank, but I meant tracker ticks.

I tried reading your post again with that perspective in mind but I'm still not quite sure how it would help you or simplify anything. As mentioned before, OpenMPT doesn't attempt to align tracker ticks with anything else in the audio pipeline, in particular since their duration can vary.

Title: Re: How does OpenMPT's audio pipeline work?
Post by: nyanpasu64 on January 24, 2020, 20:47:47

"libopenmpt does not talk directly to an output device, but merely exposes a callback api with no knowledge of locks or portaudio. (OpenMPT allows simple edits to patterns without locks! Complex edits require locking though.) libopenmpt can be called via ffmpeg or foobar2000, which have their own non-speaker output mechanisms."

Is this correct? I'm going to design my API similarly, with the actual synth being unaware of locks.

The document storage will use a fancy double-buffered page-flip mechanism so the audio thread will always have at least 1 of 2 documents not locked by the GUI. The atomic operations were complicated to get right, and I had to read https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf to make sure. I still haven't actually implemented editing... probably should add a unit test for that once I get the editing API down :S

Title: Re: How does OpenMPT's audio pipeline work?
Post by: Saga Musix on January 24, 2020, 20:55:16

libopenmpt doesn't use callbacks (you simply call into the library to get audio), the rest looks correctly to me.

ModPlug Central

OpenMPT => Development Corner => Topic started by: nyanpasu64 on August 28, 2019, 23:05:33