I tried to change MQTT input node settings as you recommened , but the quality of the audio is not changed.
I can hear what other say, but the sound is clipping. After checking the MQTT data which I send from Raspi Zero, I found that sometimes data is full of 0 byte. Maybe it is the reason of sound clipping.
I don't think your problem has anything to do with mqtt. The dashboard's audio-out node simply can only play a single audio buffer, for example an mp3 file. You push an infinite stream of audio fragments to it, which it is not what this node is designed for...
I have tried in the past to implement this feature, but it was just to hard in my limited spare time.
You can easily fool your eyes by sending separate images to it: your eyes will believe they see fluent video.
It is very hard to fool your ears: they will hear every abnormal transition from one segment to the next.
At one moment the audio chunks will arrive too soon, so some of them will be skipped by the audio node. At another moment the audio chunks will arrive not fast enough, which means there will be a gap of silence. In both cases you get steep edges, as transition between succesive audio fragments. And that is what your ears hear.
The audio-node immediately plays a fragment, as soon as you pass it one. I tried to add a buffer to it in the past (i.e. an array where N input fragments are being stored), and then it was already much better. Because if the buffer is large enough, the node will play all the segments nicely after each other. So no steep edges between the fragments.
However the buffer size needs to be small, otherwise the delay will become too noticable. And then you get again more distortions.
When you search on this forum, you will quickly find that others have tried this before you already (e.g. here) ...
Thank you so much!
Your explanation is very clear and intelligible. I learn so much from that.
As I said above, I also tested to stream live audio via MQTT between 2 Raspberry Pi Zero and the output sound is quite clear.
In the below picture, when I wanted to send raw audio from microphone (plugged to Node-red installed machine) to Raspi Zero. I got steep edges as you said.
I mean in the first case, Raspi Zero can reproduce audio nice, but in the second case,
I think the way how the audio transfers is the same with which I am trying to do: Raw data from microphone is splited in separate buffer ( size of buffer can be set as you said) before sending by MQTT.
I can't understand what is the different thing here.
This is something I cannot really understand. Of course what do you mean with "quite clear"? During my experiments I had quite some irritating sounds. Based on my experiments there was no way to get a clear audio signal on my speakers this way.
The only solution (at the time being) was smoothing the transitions between (out of order) audio segments to get rid of the steep edges.
Don't know if it is possible to share a short audio sample here, so that we can hear the result of your test?
Although your question is very well documented, I don't understand the difference between the experiments:
A live audio stream via MQTT between 2 Raspberry zero's give good quality.
A live audio stream between Node-RED and Raspberry Pi is full of Noice.
Isn't that the same? I assume you run Node-RED also on both Raspberry zero's?
A quick sketch of both setups would probably help, so that I can understand which nodes are running where...
In both cases (pc and raspberry) you listen to the audio via your Node-RED dashboard? Do you mean that in the second case you run your browser installed on the Raspberry? Would also help if you explain this in your sketch...
Can you explain a bit more why you think this could be the cause?
From your screenshots I assume that you use the microphone node in all your experiments to capture the voice? Or is that not correct? And is it is this microphone node?
Note that you could perhaps use my node-red-contrib-msg-speed and node-red-contrib-msg-size nodes to calculate statistics about how much segments are travelling through your wires, and the sizes of those segments. If you think data is chunked somewhere, you should different statistics in your two experiments...
To understand where chunking could get messy, it would be very useful to see the entire tranject (from microphone to the speakers) in a single (manually drawn) sketch...
From this we can conclude that Python does it very well, both for capturing the audio from the USB microphone and for playing that audio on your speaker. So Python does somehow reassemble the audio chunks fine. Is there a large delay perhaps: because that would indicate that they use a buffer that is large enough. When that is not the case they do some good processing of the chunks.
So we need to know which of those two part (capture or playing) fails in Node-RED, by two extra experiments.
By doing this, we only replace the python library (which captures the usb microphone audio) by the node-red-contrib-micropi node. If this works well, then we at least know that the micropi node is doing a good job.
By doing this we only replace the python library (which plays the audio on the speaker) by the ui-audio-out node. I don't expect this to give good audio quality, by the web-audio issue that I described above....
From your experiment (let's call it experiment 4) I conclude that the node-red-contrib-micropi node captures the audio very well, since it results in good audio after playback.
For some reason the dashboard audio-out ui node is able to play it rather good: I assume the chunks arrive in a pretty decently order, otherwise I cannot explain this. But this won't be the case always.
You could do also experiment 5:
Because it seems that mqtt introduces timing issues, which is just too much for the dashboard audio-out node to be able to play it correctly.
that could indeed be a quick workaround.
But if nobody diggs into this problem, we keep on getting the same issue until ethernity.
Would be nice if we could have a complete Node-RED solution...
Because there are enough use cases for this.
We need a player that can handle infinite streams of audio chunks, and can seemless play them sequentially. But it should support smoothing of steep edges when the next arrives not in time.
Because it takes too much time to develop something yourself. Some things I tried in the past:
Mostly they refer to solutions where you schedule the chunks via the web audio api. But I got still way too much distortions
Tried a datasource that created continiously zero values, and then overwrite those zeros by real audio values when available. This was much better compared to solution 1, but still bad on Android.
I haven't tried smoothing the edges due to a lack of free time.
But not sure if a library like e.g. pcm-player can solve our issue, because - when looking at the code - it only applies fading when being flushed (by default every second) but not in between (and steep edges can happen everywhere). But I haven't analyzed it in detail...
I tested for this case.
I found an importance thing which you said before: If I change the buffer size (frames_per_buffer) to the "proper value", the sound is smoothler than other value. And I got the same quality as Experiment 1.
Maybe the issue here is the "proper" buffer size?
But I've changed that value (1024 bytes, 2048 bytes, 4096 bytes, ...) for Experiment 2, but I couldn't get clear sound...
I assume frames_per_buffer is a setting of the python player? And what do you mean by "proper value"?
A larger player buffer means that mostly all segments will be available at the time the player will start playing them. So less steep edges...