My project: ProTracker mod player, in kotlin

Started by mabersold, September 21, 2021, 16:33:14

Previous topic - Next topic

mabersold

I've gotten far enough that I think it is presentable to others: https://github.com/mabersold/kotlin-protracker-demo

I wanted to see if I could write a mod player, and I wanted to do it in Kotlin. This is the result. My goal was to get it to play one specific song: Space Debris (included with the code, no command line arguments needed). I also had a secondary goal of making the code as easy-to-understand as possible (looking at other github repos, I noticed that it was pretty common to have zero explanation for how the code worked - I wanted mine to be well documented).

My resampling algorithm was basically to calculate a "step" variable (a double) and continually add to an index reference to find the correct index in an instrument's audio data array. It also calculates how many steps before it will reach the next index, and does a simple slope calculation to interpolate. Because it always reduces the period relative to the base instrument audio data, I do not implement any anti-aliasing.

I should note that I only implemented enough to play one specific song. Any features that were not used for this song were not implemented, such as fine-tuning, arpeggio effect, and others.

For the most part this seems to work pretty well. I'm not sure my vibrato implementation is 100% correct, but it sounds good, at least. I also start playing all instruments at index 2, rather than index 0, because I read in the various documents that the first two bytes are actually supposed to be looping data - not sure if this is what I should have done, but it sounds fine.

Additional note: in the code and documentation, I use the word "sample" only to refer to the individual elements of a PCM stream, not the instruments. This is to avoid confusion and to not overload the word "sample."

A few ways this player could be modified:
-Implement remaining ProTracker features
-Change the audio generator to retrieve more than one sample at a time
-Identify and remove unnecessary calculations
-Use coroutines while generating audio, or sending it to output (Kotlin implementation of threading)
-Resample to 16-bit instead of 8-bit (probably not necessary unless I want to support other formats that use 16-bit instruments)
-Extract constants out to separate file
-Global volume control
-Reduce stereo panning separation
-General reorganization, refactoring, and documentation improvements

Enjoy. Any feedback is appreciated (keeping in mind that this is a demo, not an actual product).

Saga Musix

Nice little project.

Quote-Resample to 16-bit instead of 8-bit (probably not necessary unless I want to support other formats that use 16-bit instruments)
This got me a bit curious, and I had a look at the interpolation code. It seems like a rather convoluted way of expressing linear interpolation and could probably be simplified and sped up a lot (yeah yeah, any PC can play a 4-channel MOD these days... but where does it stop?). Anyway, since you are already dealing with double-precision floating-point numbers there it makes little sense to convert the result back to 8-bit audio. You throw away a lot of precision from the interpolation process and get stair steps in your interpolated data, so it would make more sense to keep the whole audio path in floating-point after interpolation. In the end, that's what modern audio APIs expect as input, anyway.
» No support, bug reports, feature requests via private messages - they will not be answered. Use the forums and the issue tracker so that everyone can benefit from your post.

mabersold

Quote from: Saga Musix on September 22, 2021, 19:31:34
This got me a bit curious, and I had a look at the interpolation code. It seems like a rather convoluted way of expressing linear interpolation and could probably be simplified and sped up a lot (yeah yeah, any PC can play a 4-channel MOD these days... but where does it stop?). Anyway, since you are already dealing with double-precision floating-point numbers there it makes little sense to convert the result back to 8-bit audio. You throw away a lot of precision from the interpolation process and get stair steps in your interpolated data, so it would make more sense to keep the whole audio path in floating-point after interpolation. In the end, that's what modern audio APIs expect as input, anyway.

Interesting, so are you saying the resulting PCM data can be represented as a collection of floats rather than a collection of bytes (or words/shorts if I'm in 16-bit)? I just assumed that since all the inputs are byte arrays, I might as well just keep the output as a collection of bytes as well.

Saga Musix

Well, just for the sake of an example, let's assume you play a sample so slowly that it takes 10 steps to go from the first sample to second, and the first sample has value 1 and the second sample has value 0.

By representing the interpolated data as 8-bit values, you get this:
1 1 1 1 1 0 0 0 0 0
But by representing it as floating-point value, you get this:
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

Obviously the latter is a "real" linear interpolation, while the former is a very distorted variant of that. The final mixed result (the sum of the 4 channels) should also not be represented as an 8-bit signal - that is not at all what the Amiga (or even most PC trackers except maybe for very early ones that could only use 8-bit soundcards) did. In addition to the 8-bit sample data you still have 6 bits of volume, so every channel effectively has a 14-bit resolution (just because a sample is played at volume 1 doesn't mean that it only alters between on and off).
By doing all the mixing in floating-point, you are completely independent of the original sample resolution and the mixing is done exactly the same for all sample types, no matter if 8, 16 or 24 bits. Only the initial interpolation differs between those because the integer data has to be changed to floating-point first (usually by converting it to a nominal -1...+1 range, so -128 in an 8-bit sample turns into -1.0, and -32768 in a 16-bit sample turns into -1.0 as well).

Many similar questions have already been asked on this topic so you can find various posts about this here in the development corner.
» No support, bug reports, feature requests via private messages - they will not be answered. Use the forums and the issue tracker so that everyone can benefit from your post.

mabersold

That makes sense - I'm aware of a stair-step issue when I keep the output as bytes, but I didn't know about the PCM output being potentially represented as floats. I'll have to see how this would work with the class I'm using for audio output. Converting to a +1.0 to -1.0 range doesn't sound particularly difficult (maybe I could even do part of this while loading the module: just make a separate list of audio data in this form for each instrument when I load, and refer to that during playback instead of the byte array). What I'm not sure about is how I'll need to modify my AudioFormat so the SourceDataLine will accept floating point data - I guess I have some research to do.

I did notice when I changed the sampling rate from 44100 to 48000 the song definitely had some distortion, probably a result of the stair-step issue you described.

mabersold

Update: Looks like it's not possible to write to audio output in PCM_FLOAT - at least on the system I'm using right now. I guess this is a limitation of the JVM.

That being said, I'll still make a change anyway - probably will convert to float when loading the mod, handle most of the sample generation as float within a -1 to 1 range, and convert to short before writing to audio output (I'm not 100% sure yet where the conversion to short will happen). It won't necessarily be efficient, but that's not really the goal anyway.

I am curious how this would work in a different language. I may later attempt to re-implement this in a more efficient language like Rust - I don't know Rust very well yet, but I do find it intriguing and think this could be a good project for learning the language.

Saga Musix

In the end, the JVM always has to talk to the host operating system for sound output, so it's not completely technically impossible, but maybe not with the "classic" Java APIs that most probably were not written with modern multimedia functionality in mind. There are PortAudio bindings for Java, which Kotlin should be fully interoperable with from my understanding. That way, floating-point output should be easily doable on all platforms PortAudio is supported on. Rust definitely feels like a more suitable language for this kind of software.
» No support, bug reports, feature requests via private messages - they will not be answered. Use the forums and the issue tracker so that everyone can benefit from your post.