unstable volume when using WAVE_FORMAT_IEEE_FLOAT

TheRealByteraver · December 03, 2019, 19:28:37

Hi, I changed the type of the output that I send to the Windows audio driver through the wave_mapper interface from 32 bit integer to 32 bit floating point. This works fine, however I noticed the following:
- The music sounds a lot louder, even though I did not change the gain internally or anything.
- The volume varies so much during replay it is quite noticeable. It seems like the audio driver / soundcard / windows is doing some kind of on-the-fly compression of the sound. Does anybody have any experience with this?

Saga Musix · December 03, 2019, 19:31:45

Quote from: TheRealByteraver on December 03, 2019, 19:28:37
- The music sounds a lot louder, even though I did not change the gain internally or anything.

You most likely changed the scale. If you render 32-bit integer audio, your 0dB were probably not at int32_max, or were they?

Quote from: TheRealByteraver on December 03, 2019, 19:28:37- The volume varies so much during replay it is quite noticeable. It seems like the audio driver / soundcard / windows is doing some kind of on-the-fly compression of the sound. Does anybody have any experience with this?

Yes, the Windows audio backend applies a limiter unless you are using the device exclusively. The fact that it does this with just your own audio playing is another confirmation that you are rendering your audio louder than intended, i.e. your output exceeds 0dB.

TheRealByteraver · December 04, 2019, 10:10:34

OK So I had a look at it again today with a fresh mind and fixed it in like 1 min

. I am ashamed to admit that the issue was as trivial as you mentioned: I overshot the (-1, +1) range. I divided the 24 bit integer values (the mixer is still fixed point internally) by 32768 instead of by (32768 * 256). It is interesting though how the soundcard / driver (?) adjusts the gain for you, without causing crackle or anything, which is kinda cool. So I still learned something

. It did mask the problem a bit for me though

Thank you for replying - and for your patience.

Saga Musix · December 04, 2019, 10:13:04

QuoteIt is interesting though how the soundcard / driver (?) adjusts the gain for you, without causing crackle or anything, which is kinda cool.

Well, it sounds nice in theory (for daily "consumer" usage) but in practice when producing music I think it's a huge problem - which is why OpenMPT clips floating-point output by itself, to keep the sound consistent between integer and floating-point output.

TheRealByteraver · December 04, 2019, 20:48:52

I agree. I found the volume change quite unsettling

Does ModPlug clip output at (- 1/16, 1/16 ) then? Or does that question make no sense?

Saga Musix · December 04, 2019, 22:18:20

The output is clipped to the range [-1.0, 1.0] after conversion. I assume you think there is a 1/16 factor there because the internal format is 4.28, but no, those extra 4 bits of headroom are essentially clipped, if used. Just like with integer output formats.

jmkz · December 07, 2019, 10:07:10

As a side(?) note: almost all vendor integrated sound devices have sound processing enabled by default in drivers (look for APO filters), and this may lower or increase volume or alter intended sound, or even when device/MB has auto impedance detection. Best way to go is using WASAPI (exclusive mode), WaveRT or ASIO (if supported drivers are available), and as a result you'll get short audio paths.

TheRealByteraver · December 07, 2019, 15:38:53

Thanks for the replies. I suppose the 4 bits of headroom you mention are located in the exponent part of the 32 bit float value then?

Saga Musix · December 07, 2019, 18:10:37

It simply means that the internal full range of -2^31...+2^31 is equal to -16...+16 in float, but the effective output is clipped to -1...+1.

TheRealByteraver · December 08, 2019, 16:55:50

Hi again! Sorry for being so ignorant, but I just don't get it. I watched a few video's on the IEEE 754 float format (which was interesting) and I can't figure out where the 28 significant bits come from, or where the range [-16,16] comes from that you mentioned. I understand it might take us a little far explaining this in detail, but do you have some more information on this subject that I could read up upon? For context, I watched following video's explaining the floating point format (the indian kid explains it very well):
https://www.youtube.com/watch?v=8afbTaA-gOQ
https://www.youtube.com/watch?v=LXF-wcoeT0o

At first, I naïvely thought that, to represent the first digit of a decimal number in scientific notation, you need 4 bits, hence the -16...+16 range, but after watching the above vids I realised that was just silly

QuoteIt simply means that the internal full range of -2^31...+2^31 is equal to -16...+16 in float

--> Is this some convention that was made in the context of digital audio? Do you know how this gets converted?

edit: typo

Saga Musix · December 08, 2019, 17:27:47

Okay, I think to understand this you need to throw anything you know about the internal representation of floating-point numbers overboard, because it doesn't matter here. OpenMPT doesn't do any bit hacking tricks to convert integers to floats, so exponents and mantissas and whatever amount of bits they have are completely irrelevant to this discussions. All that is relevant here is that floating-point numbers can represent numbers between 0 and 1.

With that said:
1. By convention, a range of [-1,+1] is used in the world of floating-point audio to designate unclipped audio, that is, 0dB is the maximum volume. If you exceed this range, you get volumes above 0dB. This range can be exceeded temporarily (e.g. while it is being processed in, say, a VST plugin), but at some point, the audio data has to be converted from digital to analog (by using a DAC) and this DAC will only output a signal that is no louder than 0dB, so if not done before, the signal will being clipped here.
2. In the integer audio world there is no such nominal range because different applications have different needs with regards to headroom and precision. OpenMPT's choice of 4-bit headroom (i.e. possible volume above 0dB) is completely arbitrary in this sense. The headroom of a floating-point implementation on the other hand would be virtually unlimited (it is limited by the largest number that can be expressed with your floating-point type).
3. 2^4=16, hence the signal can exceed 0dB by a factor of 16 during processing (it can be 16 times louder than nominal 0dB, or 24dB -- 4 bits * 6dB/bit = 24dB). Translating this to the floating-point world means that the possible range in float is [-16,+16].
4. Conversion from integer to floating point is simply done by dividing so that the nominal 0dB of the integer range are equal to the +1/-1 float range. In OpenMPT's case, this means that dividing the integers from the mixer by 2^27 (4 bits of headroom + 1 sign bit + 27 precision bits). Any sample value outside of [-2^27,+2^27] in the integer range will yield a floating-point number outside of [-1,+1], i.e. volume above 0dB. As the maximum range of a 32-bit integer is [-2^31,+2^31-1], which is 16 times more than the [-2^27,+2^27] range, the floating-point output of OpenMPT cannot exceed the range [-16,+16].
5. Due to the aforementioned Windows mixer effects, anything outside of [-1,+1] is clipped to that range, even if it is not necessary from a technical point of view.

Maybe that clears it up?

TheRealByteraver · December 08, 2019, 18:13:49

That absolutely positively clears it up

Thank you for your fast and elaborate answer. As you could probably guess I was confused by the 28 bits of precision, as a 32 bit float only has 24 (useful) bits of precision (which is more than enough of course). I couldn't figure out how to fit 28 bits inside 24 bits

.

I'll edit my mixer to clip/clamp the float values as well, rather than making sure I never go outside of the [-1,+1] range. I should probably convert it entirely to floating point. Might do so when the time comes to optimize the mixer for speed.

Saga Musix · December 08, 2019, 18:18:39

QuoteI couldn't figure out how to fit 28 bits inside 24 bits

This leads to an interesting point actually: The answer is that you can't without a loss of precision. A 32-bit integer has a higher precision than a 32-bit floating-point number in the range it supports, obviously at the expense of having a smaller range of values. Hence, in the professional signal processing world (and by that I don't mean VST plugins or similar consumer audio stuff) you will sometimes see integers being used rather than floats, because sometimes it's this extra precision and the lack of rounding errors that count more than the increased dynamic range. It's either precision or dynamic range - and in the audio world dynamic range is typically more important.

TheRealByteraver · December 08, 2019, 19:24:52

The extra bits in the 32 bit integer representation leave more room for (lossless) maneuvering when producing. For the final result / rendering / output, 32 bit floats should be more than precise enough. I'm not sure I can even hear the difference between 12 bit and 16 bit audio to be honest. I should give that a try

Maybe volume changes would be more abrupt.

About dynamic range. Have you ever tried to listen to Carl Orf's Carmina Burana? Like from a CD? In the beginning you're like "weird, there is no sound coming from my speakers", so you glue your ears to the loudspeakers just to make sure. Then the sound explodes and hop - hearing damage

Maybe a higher dynamic range would make sense when recording this type of classical music. I wonder how many of the available 16 bits get used on a cd recording of this piece - the silent part in the beginning I mean. Definitely not all 16.

Saga Musix · December 10, 2019, 16:54:39

QuoteThe extra bits in the 32 bit integer representation leave more room for (lossless) maneuvering when producing

No, that's exactly the opposite of what I tried to explain in my previous post: You have actually less "room" (i.e. dynamic range), so if you are not extremely careful, you will easily end up with clipped, or even worse, wrapped-around signals. Common integer-based DSPs luckily have saturated arithmetic in their instruction set but common desktop CPUs are lacking this, which makes integer audio very difficult to work with if you expect even audio >0dB even just momentarily. This is why floating-point is much easier to work with during production. On the other hand, for the final result, dithered 16 bit integer is more than enough in most cases (even though audiophiles might try to convince you of the opposite).
If you are doing extremely high quality digital signal processing, the small rounding and re-quantization errors introduced by floating-point processing might be relevant - but for a typical audio mixer in a tracker or DAW it really doesn't matter, the level of distortion introduced by those errors is very well below the thresholds of hearing. It's much more important to have the increased dynamic range available there.

You should really watch the classic Digital Show & Tell video to get a good understanding of this, it's 24 minutes well spent if you are interested in the topic of audio processing.

QuoteMaybe a higher dynamic range would make sense when recording this type of classical music.

As pointed out above, it may make sense to use equipment with higher sensitivity during recording, but you have to keep in mind that even with the best equipment in the world, a 24 bit ADC or DAC is not really 24 bit under real-world circumstances. I don't recall exact numbers or sources but I think for common audio equipment you will in practice never get above 20 bit resolution. But even 20 bits would be plenty of dynamics. For the final result on CD you can be sure that all 16 bits will be used. But that does of course not mean that in the most quiet parts there will be only the lowest bit being used. Even the lowest part in that piece will have several bits of resolution, which is of course required because otherwise they would just be a distorted mess of 1s and 0s. Again, dithering helps there to increase the perceived dynamic range (see the video above).