Thank you so much for this explanation!
Also the video was very, very well made. I'll have to watch it a few times because the guy goes
fast. I was shocked to hear that cassette tapes were probably not better than 6 bits

I always wondered what dithering meant, now I have an idea. If I understood it correctly, you could in theory remove some noise from 8 bit samples by converting them to 16 bit and then adding a (high frequency?) dithering signal with a maximum amplitude of 256? Could it be used as an alternative to interpolation? Even though I suppose interpolation would be faster, even if of the sinc variant.
I was planning to do some programming but here I am reading up on Wikipedia for already two hours again

About the bit depth, after posting my previous reply, it came to me that if you use 32 bit floats, your signal will never be
less precise than 24 bits. It is indeed a lot easier to work with compared to fixed point integer for mixing. All you need to do is scale it down again at the end of the mixing process (or perhaps a little bit up).