Linux Audio

Started by Brozilla, September 02, 2016, 02:12:00

Previous topic - Next topic

Brozilla

As a small C++ project I'm looking to create a basic PCM player in linux and move up from there (into perhaps a mixer.) For the task it seems like ALSA is the best way to go about it however there is one thing that's been troubling. How does panning work?
Assume I'd use the channel mapping feature on defined here: http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#ga7e082d9ea701709270b0674a0be23b09

The only issue, with my meager audio programming knowledge, is it appears I'd need 2 PCM streams to represent Left and Right to generate stereo. So let's say we represent panning as an uint8 so 0->255. We have volume as uint8 as well. So to get a center pan (127) we'd have L64 and R64.
If we want to move 50% to the right it would look like L48 and R80 as volume while the pan value would be ~192.

The math might be butchered but as stated, how would I implement panning? The best way might be writing the application but I'm certain there is no need for 2 PCM streams to represent both channels. My knowledge is that panning controls L & R/whatever volumes but I don't foresee a way to do that unless I'm interpreting the API incorrectly.
44.1 vs. 48khz sampling rate

manx

I'm reordering the quotes

Quote from: Enumeratingw7 on September 02, 2016, 02:12:00
How does panning work?

So let's say we represent panning as an uint8 so 0->255. We have volume as uint8 as well. So to get a center pan (127) we'd have L64 and R64.
If we want to move 50% to the right it would look like L48 and R80 as volume while the pan value would be ~192.

The math might be butchered but as stated, how would I implement panning?

There are actually 2 aspects to consider:
1.: The way you scale your input parameter (i.e. a GUI slider) to the actual input parameter of the panning algorithm itself (that may not be just linear)
2.: The panning algorithm itself
In most literature, these two get just squashed together and called "panning law".
Some people also make a distinction between "panning" (roughly meaning positioning a sound source) and "balance" (relative adjustment of speaker volumes), however, this naming is also used mostly sloppy and interchangedly in the wild.
I'll also simplify here (as is mostly done), and just talk mostly about aspect 2. of both panning and balance, however keep in mind that you may still additionally map the input parameter non-linearly.

So, we now have an input parameter x [-1..1] (or scaled y = ( x + 1.0 ) * 0.5; [0..1]) to map to 2 amplitude factors l and r [0..1].
There are basically 2 very simple options (and countless variantions of these 2):
A.: l = ( 1.0 - y ) * 2.0 ; r = ( 0.0 + y ) * 2.0; (the resulting factors are actually in [0..2] in this case)
B.: l = ( x < 0.0 ) ? 1.0 : 1.0 - x; r = ( x > 0.0 ) ? 1.0 : 1.0 + x; (this, or similar, is what the "balance" knob on some HI-FIs may do)
For both of these, the resulting factor may additionally be scaled logarithmically/exponentially before being applied to the PCM signal.
However, note that both of these will not preserve the overall perceived volume for a listener being positioned at equal distance from both speakers (because the ear measures energy and not amplitude).
Better in that regard are:
C.: l = ( 1.0 - y ) ^ 0.5; r = ( 0.0 + y ) ^ 0.5;
D.: l = cos( pi * ( x + 1.0 ) * 0.25 ); r = sin( pi * ( x + 1.0 ) * 0.25 );
(I did not verify all the formulas thoroughly right now, thus I might have gotten the math wrong in some places ;-)
There are lots of other options, just do a search for "panning", "balance", "panning law", "panlaw", "pan rule", "constant power panning", "equal power panning".

Also note that the perceived meaning of "totally left" (x == -1) is actually dependent on the listening situation. If you have both speakers far away and close to each other (i.e. very narrow angle), the difference might even be totally unnoticable (the effect of not preserving equal power in the panning algorithm is also most obvious in this situation). When using headphones, "totally left" results in an unnatural audio signal hitting both ears (for natural sound sources, such a signal is almost impossible to occur), which can totally freak out the brain for some listeners.

In a playback situation (in contrast to a production situation), adjusting balance instead of panning is generally more common and IMHO more useful and easier to adjust for the user (however, preferences in that regard may differ from one person to another). When doing audio production, balance is close to useless.

In general, placing a balance knob into each and every playback application is conceptually wrong, as this is a parameter of the playback system and should be controlled system-wide. Modern Windows as well as modern Linux (by the use of PulseAudio) do it that way (macOS probably also, but I cannot verify that). OpenMPT itself does not offer such a parameter either.


Quote from: Enumeratingw7 on September 02, 2016, 02:12:00
As a small C++ project I'm looking to create a basic PCM player in linux and move up from there (into perhaps a mixer.) For the task it seems like ALSA is the best way to go about it however there is one thing that's been troubling.

Assume I'd use the channel mapping feature on defined here: http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#ga7e082d9ea701709270b0674a0be23b09

The only issue, with my meager audio programming knowledge, is it appears I'd need 2 PCM streams to represent Left and Right to generate stereo.

The best way might be writing the application but I'm certain there is no need for 2 PCM streams to represent both channels. My knowledge is that panning controls L & R/whatever volumes but I don't foresee a way to do that unless I'm interpreting the API incorrectly.

I'm not sure if I'm following the overall question correctly, thus I may be stating obvious stuff here. You do not want to open 2 separate stream to do stereo with any API (ALSA included). What you want to do is open a single stream with 2 channels (the exact naming of all these aspect differs from one API to the other).
For most APIs (and I think also ALSA), you do not have to specify the channel layout explicitly if you want to use a standard one like mono or stereo. You just set the number of channel for your stream to 1 or 2 respectively.
With regard to applying the balance/panning, you can either do that yourself (by multiplying the individual sample values with l or r before sending them to the output API, or, if the API provides it, use the API to apply channel volume. Some APIs even provide a way to set balance directly, which then of course will use the panning algorithm of their choosing.

Also, I would advice against using ALSA directly unless you have to interface with the kernel soundcard drivers directly (especially if you are unsure or do not know, you certainly dont want to use ALSA, IMHO). ALSA is a very complex interface that exposes way more features than needed for a simple playback application and also exposes way more complexity that you absolutely have to deal with yourself to even get it working at all on the various different system setups. If you are doing this just to learn stuff, you may of course choose ALSA precisely for its complexity in order to learn that stuff, however, otherwise there are far simpler and better fitting alternatives available:
PortAudio and RtAudio are the most common low-level-ish playback and recording libraries. If you target Linux Desktop only, PulseAudio (via libpulse) is also an option.
libpulse-simple is probably the best thing to use if you are targeting Linux Desktop only. It supports precise latency information and configuration and is dead simple to use, and is probably the one the works most flawlessly on modern systems.
If you do not care about recording or precise playback latency or buffering requirements, libsdl or libao are even simpler.
In the Pro-Audio field there is also JACK.
And there are also various others, that I wont name right now.

Brozilla

Quote from: manx on September 02, 2016, 11:09:08

So, we now have an input parameter x [-1..1] (or scaled y = ( x + 1.0 ) * 0.5; [0..1]) to map to 2 amplitude factors l and r [0..1].
There are basically 2 very simple options (and countless variantions of these 2):
A.: l = ( 1.0 - y ) * 2.0 ; r = ( 0.0 + y ) * 2.0; (the resulting factors are actually in [0..2] in this case)
I'll read the panning law and research balance. Essentially I don't understand is how that translates to a position on the speaker but clearly I didn't do enough beforehand research. I was thinking those seemed like cos & sin but in example D you showed that point.

Quote from: manx
I'm not sure if I'm following the overall question correctly, thus I may be stating obvious stuff here. You do not want to open 2 separate stream to do stereo with any API (ALSA included). What you want to do is open a single stream with 2 channels (the exact naming of all these aspect differs from one API to the other).
For most APIs (and I think also ALSA), you do not have to specify the channel layout explicitly if you want to use a standard one like mono or stereo. You just set the number of channel for your stream to 1 or 2 respectively.


PortAudio and RtAudio are the most common low-level-ish playback and recording libraries. If you target Linux Desktop only, PulseAudio (via libpulse) is also an option.
libpulse-simple is probably the best thing to use if you are targeting Linux Desktop only. It supports precise latency information and configuration and is dead simple to use, and is probably the one the works most flawlessly on modern systems.
If you do not care about recording or precise playback latency or buffering requirements, libsdl or libao are even simpler.
In the Pro-Audio field there is also JACK.
And there are also various others, that I wont name right now.

Alsa wasn't my first choice but couldn't wrap my head around how the things would work. Learning Jack is originally what I wanted. So if I'm correct, what determines the pan of a sound source is how the wave is contrived?
44.1 vs. 48khz sampling rate