How does OpenMPT's audio pipeline work?

Saga Musix · September 08, 2019, 13:52:36

Quote from: nyanpasu64 on September 08, 2019, 11:28:23Does OpenMPT preview audio immediately when keys are pressed, even in the middle of a tick?

OpenMPT immediately allocates a channel and fills it with the required information, but the list of active voices (which is probably not something you have in the NSF scenario) that is used by the mixer is not updated to contain this new channel until the next tick is processed. This is mostly for simplicity reasons because in reality a bit more has to be done than just inserting the channel into that list, and the code that does this "a bit more" stuff is not meant to be run more than once per tick (because it does all sorts of updates to all active channels).
As a result, if you have very long ticks you will not hear the previewed note instantly, but given that this is a rather unlikely scenario to happen, this architectural trade-off is probably not too bad.

nyanpasu64 · September 16, 2019, 00:13:04

- I tried building OpenMPT. I had to upgrade the Windows 10 SDK to 17763.
- Also I disabled Spectre mitigations using sed. It's extra work to install Spectre-mitigated libraries. Also, I tried and failed to install Spectre-mitigated libraries and MFC, as (if I recall) Spectre-mitigated MFC didn't exist for the latest SDK or compiler I was using. I can't imagine that OpenMPT could be a useful target (or attacker?) for Spectre attacks.

Also how does OpenMPT allow entering notes into patterns, while the pattern is being read by the sequencer/synth? Does it use locks to ensure only the audio thread is reading, or UI is reading or writing? I'm reading Sndmix.cpp now.

Saga Musix · September 16, 2019, 11:47:44

Quote from: nyanpasu64 on September 16, 2019, 00:13:04
- I tried building OpenMPT. I had to upgrade the Windows 10 SDK to 17763.

I suppose you mean you had to update the SDK in the project file? Yes, this is a bit messy because I think you cannot just tell MSVC to use any Windows 10 SDK available. It's simpler to build the Windows 7 variant of OpenMPT as there is no SDK version ambiguity in that case.

Quote from: nyanpasu64 on September 16, 2019, 00:13:04
- Also I disabled Spectre mitigations using sed. It's extra work to install Spectre-mitigated libraries. Also, I tried and failed to install Spectre-mitigated libraries and MFC, as (if I recall) Spectre-mitigated MFC didn't exist for the latest SDK or compiler I was using. I can't imagine that OpenMPT could be a useful target (or attacker?) for Spectre attacks.

While it might not be a very realistic target, it makes sense deploying Spectre mitigations in all software, plus libopenmpt may be used in contexts where Spectre mitigation does matter - we have no control over that.

Quote from: nyanpasu64 on September 16, 2019, 00:13:04Also how does OpenMPT allow entering notes into patterns, while the pattern is being read by the sequencer/synth? Does it use locks to ensure only the audio thread is reading, or UI is reading or writing? I'm reading Sndmix.cpp now.

There is a critical section (mutex) around the audio rendering (see CMainFrame::SoundSourceLock / CMainFrame::SoundSourceUnlock) and any editing actions that may modify the CSoundFile object in a that touches any internal pointers (e.g. moving / deleting child objects such as instruments). Editing simple attributes such as sample volume or similar does not require a mutex.

Note that depending on the data access scheme (i.e. if a lot of concurrent reads are expected from more than one thread, but only few writes), a shared mutex with the option of exclusive locking may be more efficient. This means that concurrent reads won't have to wait for each other, they would just have to wait if some write operation locks the mutex exclusively. OpenMPT's CSoundFile lock may move into that direction in the future in particular due to the planned scripting API.

nyanpasu64 · September 23, 2019, 07:34:04

I may be dropping Rust and using C++ instead. Rust and Qt do not interact well, and I don't know if I can have a hybrid Rust-C++ codebase and pass around BTreeMap or std::map (a critical component of my new pattern note storage format) between Rust and C++. Might be possible, but I assume template/generic rules it out.

Sidenote: I haven't been working as actively on the tracker recently. These last few days, I've been struggling with C++ CMake Windows development. OpenMPT bundles PortAudio but hard-codes it to assume Windows, which I think is unacceptable. Alternatively, I can have the user install PortAudio systemwide (which is the norm on Linux) and I pick it up via cmake... But I want to avoid hard-coded PATH for each library I add... vcpkg is a "compiled library" manager which installs multiple libraries into a single tree, so I only need one PATH entry. vcpkg has a mechanism so I can configure that PATH entry on each individual machine. Unfortunately vcpkg's portaudio is broken and doesn't install a find_packages(portaudio) target (I may report a vcpkg bug). In CLion, I tried to add Qt to PATH, but I touched the config wrong and now it replaces PATH (if I edit the text box) instead of appending/prepending (if I edit via the dialog). (I fixed this issue for now, but I may report a CLion bug.)

Q: RtAudio (C++, not C, maybe no bindings) has no package in vcpkg, but supports PulseAudio natively. Is it a good library to use, if I'm not using Rust anymore?

I decided to write another post here because Four common mistakes in audio development (unfortunately Apple-centric) popped up on Hacker News, and claims that acquiring locks on the audio thread is bad, because if the GUI thread acquires the lock (even briefly) but the OS scheduler suspends the GUI thread, the audio thread cannot run until the GUI thread releases the lock.

I looked into 0CC because I'm partially familiar with its codebase (I don't know OpenMPT's at all). It has multiple CCriticalSection locks acquired in the audio path: m_InstrumentLock for editing instruments, m_csAPULock for the sound synth (only contended when the user reconfigures audio settings or resets the APU), some GUI-only locks, and...
Most notably, for each channel (CTrackerChannel), m_csNoteLock guards reading/writing pattern row-events (stChanNote) containing note, volume, and effects. This means that if you type notes into the tracker (so the GUI thread acquires the lock) and the OS scheduler suspends the GUI thread, the synth thread cannot access the contents of that channel. Personally I never "enter notes while the tracker is running", but if some people play MIDI instruments while the tracker is running, copying 0CC's design may (in theory) cause stuttering for them. Q: Does this seem like a design flaw? I don't know if it's a real or only theoretical problem. I've never seen note entry cause stuttering in 0cc, but 0cc requires latencies of 30+ ms to even operate properly.

Q: In OpenMPT, does the GUI thread only acquire the lock when editing instruments? Does the GUI not acquire the lock when the user enters notes into the pattern? If not, how do you atomically mutate multiple fields (note, instrument, volume) at the same time?

CPortaudioDevice::StreamCallback() is called when I begin playback or enter a note into a module. It creates SourceLockedGuard which calls SoundSourceLock()... Since both initiating playback and entering notes causes CPortaudioDevice (not the GUI) to lock, I'm guessing that SoundSourceLock is not responsible for protecting the pattern from being read and mutated at the same time. Q: Which lock/mutex is acquired by "any editing actions that may modify the CSoundFile object in a that touches any internal pointers"?

Q: Should I use a persistent data structure (common in Clojure) to store pattern data? (Persistent data structures are immutable, but I can perform a copy-with-mutation which reuses existing RAM for most unedited fields.) This way, the audio thread can atomically get a pointer to the entire module data. When the user enters notes, the main thread can create a mutated copy of the entire module data, and atomically replace the pointer.

I actually came up with this idea months ago for implementing an undo system, where I just keep around "old versions of the entire module data", which share the same storage for unmodified parts of the module, leading to less memory overhead than keeping around "copies of the entire module". I scrapped this idea and decided that a "transaction system where I save the before/after state and mutate the module in-place" was a simpler approach. However, the GUI and synth threads must lock the module, to stop the synth thread from running halfway through the user-input thread, or the user-input thread running halfway through the synth thread, etc. (Even with a concurrent hashmap, will I still get exotic forms of execution interleaving?)

One thing that immutable data structures won't help with... The synth thread has a "current playback pointer". If I edit pattern lengths or add/remove patterns while the synth thread is playing, this could create interesting failure modes. 0CC handles them fine, but I'd have to make sure my program doesn't misbehave.

https://stackoverflow.com/q/4394399/ I found via search
https://sinusoid.es/immer/ is a C++ library implementing persistent data structures
https://clojure.org/about/state#_clojure_programming describes the conceptual model of "immutable values" and "identities which can point to different structures at different times" (I'm not a Clojure user, I'm not sure if I'd use Agent or Ref. Clojure uses MVCC, does it have latency? Clojure uses JVM. 😂 In Rust, this might be crossbeam::atomic::AtomicCell<Arc<PersistentThingy>>?)

Q: What's the difficulty and maintenance burden if I or an existing dev were to take OpenMPT and add a mode where, instead of placing notes and effects on a fixed pattern grid (fixed row duration), rows/events can be placed at any rational fraction of a "beat" (quarter note)? Patterns would not store arrays of row events, but instead sorted std::map<fraction, row event>.

Optionally use a struct{fraction; signed int offset;} to allow placing multiple events at the same fraction, but delayed/early by different numbers of ticks. Might be useful, might be a worse version of a "release note early before next note" effect.

The "row duration fraction" (beats/row) is a property of the pattern editor (not the pattern), and controls what timestamps I can add, edit, or remove events. This will be useful as a non-hack approach to triplets and mixed rhythms, and for adding fine detail to a song without changing the layout of all existing notes in that pattern. Decreasing row duration will increase rows/beat, and if row height (px/row) is unchanged, will make each beat look taller on-screen.

I feel it would be difficult to add this functionality into 0CC (though I don't understand 0CC fully). If implemented, this may reduce or eliminate the need for me to write a tracker from scratch.

Saga Musix · September 26, 2019, 15:12:30

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
OpenMPT bundles PortAudio but hard-codes it to assume Windows, which I think is unacceptable.

Which part exactly of that do you consider to be unacceptable, and why? Since libopenmpt does not use PortAudio, there is currently little sense (read: it would waste time better spent on other issues) in configuring PortAudio in a more flexible way, and we bundle our own version since for two reasons:
- It contains modifications in particular in its Windows implementation.
- We want OpenMPT development to be simple without having to clone and configure dozens of dependencies on Windows.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
Q: RtAudio (C++, not C, maybe no bindings) has no package in vcpkg, but supports PulseAudio natively. Is it a good library to use, if I'm not using Rust anymore?

I haven't really used it but given that it supports WASAPI these days (IIRC it didn't back when PortAudio was first implemented in OpenMPT) it should at least be a safe choice on Windows.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
I decided to write another post here because Four common mistakes in audio development (unfortunately Apple-centric) popped up on Hacker News, and claims that acquiring locks on the audio thread is bad, because if the GUI thread acquires the lock (even briefly) but the OS scheduler suspends the GUI thread, the audio thread cannot run until the GUI thread releases the lock.

That is correct, hence my remark on the shared mutex in my previous post. In practice, OpenMPT locks the mutex for a very short period though, and mutex implementations are very fast these days, so it has never really been an issue in practice. I think this should also answer your question regarding live note entering.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04Q: In OpenMPT, does the GUI thread only acquire the lock when editing instruments? Does the GUI not acquire the lock when the user enters notes into the pattern? If not, how do you atomically mutate multiple fields (note, instrument, volume) at the same time?

As mentioned before, OpenMPT mostly only locks the mutex when it is really required (e.g. when modifying pointers). There is no need for atomic mutation of multiple pattern fields at the same time. While it may be possible that some pattern edit commands do that, in practice it won't matter if it's done atomically or not. Note that there is also no lock or specific atomic operation required to update properly-aligned integers on x86, so it's not like the pattern or instrument data would be potentially full of garbage while edits are written to memory.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
CPortaudioDevice::StreamCallback() is called when I begin playback or enter a note into a module. It creates SourceLockedGuard which calls SoundSourceLock()... Since both initiating playback and entering notes causes CPortaudioDevice (not the GUI) to lock, I'm guessing that SoundSourceLock is not responsible for protecting the pattern from being read and mutated at the same time. Q: Which lock/mutex is acquired by "any editing actions that may modify the CSoundFile object in a that touches any internal pointers"?

Also as said in my previous post, yes, SoundSourceLock is responsible for locking the data. SourceLockedGuard uses the RAII pattern to automatically unlock the CriticalSection once it goes out of scope (when its destructor is invoked automatically). SourceLockedGuard does not hold the lock indefinitely.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04Q: Should I use a persistent data structure (common in Clojure) to store pattern data? (Persistent data structures are immutable, but I can perform a copy-with-mutation which reuses existing RAM for most unedited fields.) This way, the audio thread can atomically get a pointer to the entire module data. When the user enters notes, the main thread can create a mutated copy of the entire module data, and atomically replace the pointer.

I am sure there are both reasons for and against using it but I haven't spend any thought on it so far so I'll leave it up to you to find the answer. I doubt it's feasible for all module data (e.g. samples) though.

Quote from: nyanpasu64 on September 23, 2019, 07:34:04
Q: What's the difficulty and maintenance burden if I or an existing dev were to take OpenMPT and add a mode where, instead of placing notes and effects on a fixed pattern grid (fixed row duration), rows/events can be placed at any rational fraction of a "beat" (quarter note)? Patterns would not store arrays of row events, but instead sorted std::map<fraction, row event>.

I cannot give you an estimate but I know that the maintenance burden (which would be entirely on me) would be high enough that I would not accept such a contribution. I know there are lots of deficiencies in the current system for more modern and easy music production but having your system cooperate with anything OpenMPT is doing at the moment would be way too much work. Some new ideas are best left to be implemented in new software.

nyanpasu64 · October 07, 2019, 00:35:39

Quote from: Saga Musix on September 26, 2019, 15:12:30
Quote from: nyanpasu64 on September 23, 2019, 07:34:04
OpenMPT bundles PortAudio but hard-codes it to assume Windows, which I think is unacceptable.
Which part exactly of that do you consider to be unacceptable, and why? Since libopenmpt does not use PortAudio, there is currently little sense (read: it would waste time better spent on other issues) in configuring PortAudio in a more flexible way, and we bundle our own version since for two reasons:
- It contains modifications in particular in its Windows implementation.
- We want OpenMPT development to be simple without having to clone and configure dozens of dependencies on Windows.

I didn't mean that it was a problem for OpenMPT. But my tracker is planned to be multi-platform, so I can't do the same thing and apply the same Windows-only patch.

I ended up bundling portaudio without modifications, writing a CMakeLists.txt for the portaudiocpp bindings, and porting my program to portaudiocpp. maybe i could switch to rtaudio... that'll take hours/days to get working though...

Q: Is portaudiocpp a good library to use, or should I learn portaudio's C API? portaudiocpp seems less "guaranteed safe via lifetimes and RAII" than Rust portaudio bindings.

Quote from: Saga Musix on September 26, 2019, 15:12:30I haven't really used [RtAudio] but given that it supports WASAPI these days (IIRC it didn't back when PortAudio was first implemented in OpenMPT) it should at least be a safe choice on Windows.

Sidenote: BambooTracker just switched from running audio in Qt's event loop (if GUI lags, audio stops running) to RtAudio.

Quote from: Saga Musix on September 26, 2019, 15:12:30
As mentioned before, OpenMPT mostly only locks the mutex when it is really required (e.g. when modifying pointers). There is no need for atomic mutation of multiple pattern fields at the same time. While it may be possible that some pattern edit commands do that, in practice it won't matter if it's done atomically or not. Note that there is also no lock or specific atomic operation required to update properly-aligned integers on x86, so it's not like the pattern or instrument data would be potentially full of garbage while edits are written to memory.

Maybe this is OK? Microsoft claims that x86 processors have atomic integer writes, and MSVC promises to not miscompile them. And https://preshing.com/20130618/atomic-vs-non-atomic-operations/ "In the games industry, I can tell you that a lot of 32-bit integer assignments rely [that plain 32-bit integer assignment is atomic as long as the target variable is naturally aligned.]"

Maybe this is not OK? According to https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf , data races are illegal in C11 and C++11, and can be miscompiled to malfunctioning asm on the compiler's whim (even if the CPU has atomic reads/writes). (Ironically this article is written by Hans-J. Boehm, the creator of the Boehm GC which is a pile of undefined-behavior hacks which in practice often generates correct machine code on some compilers.)

For my program, "writing into an array" isn't a powerful enough operation, since I store events in a "sorted map/list of timestamped events", and allocation/shuffling is needed if I append/delete to the end, or (worse yet) in the middle. I'm going down the "persistent data structures" path and I'll see how well it turns out.

My earlier idea for the synth was computing where vblanks occur before the callback ends, in an imperative manner.

0CC's player simulates the time taken for the driver to process each channel. It updates each channel 150-200 clock cycles after the previous one (but I think it writes all registers of each channel simultaneously?). In 0CC, this is fairly straightforward since the synth runs in a separate thread and generates 1 tick of audio at a time (though this creates latency).

Q: Is this behavior worth adding to my new tracker? (or maybe i shouldn't be asking OpenMPT devs about this) This will add enough complexity that I'd rather write a "priority-queue scheduling system" with tick, update-next-channel, and end-callback events. (I actually find this idea more elegant. It'll still be bounded-runtime, see my Google Doc with notes.)

Saga Musix · October 08, 2019, 15:16:24

Quote from: nyanpasu64 on October 07, 2019, 00:35:39
Q: Is portaudiocpp a good library to use, or should I learn portaudio's C API? portaudiocpp seems less "guaranteed safe via lifetimes and RAII" than Rust portaudio bindings.

A general answer: Even if not perfect, I'd usually prefer a C++ API over a C API when writing C++, unless there are very good reasons not to use it. Even if it does not support RAII by itself, you can still build your own RAII patterns on top, which is what OpenMPT does for various C libraries that it uses.

Quote from: nyanpasu64 on October 07, 2019, 00:35:39
Sidenote: BambooTracker just switched from running audio in Qt's event loop (if GUI lags, audio stops running) to RtAudio.

Qt sadly isn't really good at playing audio, as soon as you want to do anything slightly more complex than playing simple sound effects you either have to write very complex code, or you cannot do it at all.

Quote from: nyanpasu64 on October 07, 2019, 00:35:39Maybe this is not OK? According to https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf , data races are illegal in C11 and C++11, and can be miscompiled to malfunctioning asm on the compiler's whim (even if the CPU has atomic reads/writes). (Ironically this article is written by Hans-J. Boehm, the creator of the Boehm GC which is a pile of undefined-behavior hacks which in practice often generates correct machine code on some compilers.)

The point is that on x86 in particular there is no reason for a compiler to miscompile this kind of code - it would make the generated code more complex than the version that the platform guarantees to be safe.

In your specific case you may either have to use locks or look into lock-free data structures, but be warned that lock-free data structures are a very complex topic and often not worth the effort.

Quote from: nyanpasu64 on October 07, 2019, 00:35:39My earlier idea for the synth was computing where vblanks occur before the callback ends, in an imperative manner.

Obviously this won't work outside of the console or emulated environment. Apart from the fact that there is typically no easy way of accessing this information on modern platforms, the user can set the refresh rate to anything they want (or their device supports), and then of course there is stuff like FreeSync these days which abandonds the concept of a constant refresh rate. On an oldskool console like NES it makes a lot more sense of course.

Quote from: nyanpasu64 on October 07, 2019, 00:35:39Q: Is this behavior worth adding to my new tracker?

This sounds like something very specific to that platform so I don't have any input on that.

nyanpasu64 · October 10, 2019, 01:41:37

Thanks for the help.

Quote from: Saga Musix on October 08, 2019, 15:16:24
A general answer: Even if not perfect, I'd usually prefer a C++ API over a C API when writing C++, unless there are very good reasons not to use it. Even if it does not support RAII by itself, you can still build your own RAII patterns on top, which is what OpenMPT does for various C libraries that it uses.

Forgot to mention, portaudiocpp takes portaudio return codes and converts them into C++ exceptions. If I don't religiously add catch-all clauses, can this cause audio errors (like unplugging an audio output) to bring down my entire audio thread or program? (Apparently std::terminate() kills the thread, not process.) Is this a reason to avoid portaudiocpp?

QuoteObviously this won't work outside of the console or emulated environment. Apart from the fact that there is typically no easy way of accessing this information on modern platforms, the user can set the refresh rate to anything they want (or their device supports), and then of course there is stuff like FreeSync these days which abandonds the concept of a constant refresh rate. On an oldskool console like NES it makes a lot more sense of course.

I said vblank, but I meant tracker ticks. FamiTracker (and my new tracker) are designed to function similarly in the C++ emulation, and the 6502 asm driver (for NSF file export). As a result, tracker ticks are usually aligned with vblanks (NTSC/PAL), which the C++ code emulates based on "how many cycles were run by the audio thread". But FamiTracker and NSF files have an option to customize the tick rate. (Fun fact, one particular C++ synthesizer had a bug which caused vblank to be delayed every time you switched instruments.)

Saga Musix · October 12, 2019, 13:07:01

Quote from: nyanpasu64 on October 10, 2019, 01:41:37
Forgot to mention, portaudiocpp takes portaudio return codes and converts them into C++ exceptions. If I don't religiously add catch-all clauses, can this cause audio errors (like unplugging an audio output) to bring down my entire audio thread or program? (Apparently std::terminate() kills the thread, not process.) Is this a reason to avoid portaudiocpp?

Generally if a function is documented to throw something you should of course expect to catch it. I don't know if this was a reason for choosing the PortAudio C API in OpenMPT, but in general it should be said that exception handling in C++ is much more efficient in modern compilers than it used to be.

Quote from: nyanpasu64 on October 10, 2019, 01:41:37I said vblank, but I meant tracker ticks.

I tried reading your post again with that perspective in mind but I'm still not quite sure how it would help you or simplify anything. As mentioned before, OpenMPT doesn't attempt to align tracker ticks with anything else in the audio pipeline, in particular since their duration can vary.

nyanpasu64 · January 24, 2020, 20:47:47

"libopenmpt does not talk directly to an output device, but merely exposes a callback api with no knowledge of locks or portaudio. (OpenMPT allows simple edits to patterns without locks! Complex edits require locking though.) libopenmpt can be called via ffmpeg or foobar2000, which have their own non-speaker output mechanisms."

Is this correct? I'm going to design my API similarly, with the actual synth being unaware of locks.

The document storage will use a fancy double-buffered page-flip mechanism so the audio thread will always have at least 1 of 2 documents not locked by the GUI. The atomic operations were complicated to get right, and I had to read https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf to make sure. I still haven't actually implemented editing... probably should add a unit test for that once I get the editing API down :S

Saga Musix · January 24, 2020, 20:55:16

libopenmpt doesn't use callbacks (you simply call into the library to get audio), the rest looks correctly to me.