Issues on converting OGG to WAV

Hi everyone!

node red newbie here! If i make any obvious mistakes, please bare with me.

My project idea

After playing around with some flows i wanted to create my first "real" flow. I'd like to create a flow, that receives voice messages (ogg format) from a matrix room and uses a transcription service to transscribe the message.

First transcrption

I setup the flow at the bottom of this thread and was able to successfully transscribe a wav sample.
I downloaded the german news show Tagesschau from youtube and cut the greeting phrase 'Guten Abend, willkommen zur Tagesschau.' I have used for the transscription Microsoft Azure Speech Regognition service with a free subscription. As i stated above, right now it actually doesn't matter which service is used as i just want to try it out. I already found some selfhosted alternatives, that also work offline.

The problem

As it seems all of the transscription services require the audio sample to be in wav form. Unfortunately, matrix uses ogg as default format for voice messages. Therefore, i need a flow, that converts a "ogg formatted" buffer to a "wav formatted" buffer. I have found the following flows, that seem to do the job

  • sox - convert from node-red-contrib-sox-utils 0.5.2
  • ffmpeg conversion (1) -> from node-red-contrib-audio-convert 0.0.7
  • ffmpeg conversion (2) -> from node-red-contrib-media-utils 0.0.8

With both ffmpeg conversion flows i get the error message FFmpeg failed to perform the conversion.

The sox convert gives me `couldnt get tmp file after conversion'
The sox response debug returns this message

sox FAIL formats: can't open input file `/dev/shm/f28272dd69304585input.ogg': Input not an Ogg Vorbis audio stream

The input for the conversion flow is in both cases (sox, or ffmpeg) the msg object containing the two properties

  • payload: byte array of the downloaded message
  • format: the format specifier of the source buffer 'ogg'

I was able to convert the ogg file on my local laptop via ffmpeg. Below the console output. The converted wav can be opened and played back with VLC with no issues. Looking at the output it seems that the ogg sample contains opus coded waveform. I'm not sure if this is the same?

ffmpeg version n5.0 Copyright (c) 2000-2022 the FFmpeg developers  
 built with gcc 11.2.0 (GCC)  
 configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls  
--enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --en  
able-libjack --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librav1e  
--enable-librsvg --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis  
--enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc --enable-shared --enable-version  
3  
 libavutil      57. 17.100 / 57. 17.100  
 libavcodec     59. 18.100 / 59. 18.100  
 libavformat    59. 16.100 / 59. 16.100  
 libavdevice    59.  4.100 / 59.  4.100  
 libavfilter     8. 24.100 /  8. 24.100  
 libswscale      6.  4.100 /  6.  4.100  
 libswresample   4.  3.100 /  4.  3.100  
 libpostproc    56.  3.100 / 56.  3.100  
Input #0, ogg, from 'voice-message.ogg':  
 Duration: 00:00:04.32, start: 0.000000, bitrate: 33 kb/s  
 Stream #0:0: Audio: opus, 48000 Hz, mono, fltp  
Stream mapping:  
 Stream #0:0 -> #0:0 (opus (native) -> pcm_s16le (native))  
Press [q] to stop, [?] for help  
Output #0, wav, to 'voice-message.wav':  
 Metadata:  
   ISFT            : Lavf59.16.100  
 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s  
   Metadata:  
     encoder         : Lavc59.18.100 pcm_s16le  
size=     404kB time=00:00:04.31 bitrate= 768.1kbits/s speed= 394x       
video:0kB audio:404kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.018836%

FFMPEG flow as image

SoX flow as image

Does anyone has an idea or a good sample, on how to convert an ogg to wav? I feel i'm really close to what i want to achieve, but i'm just missing this little piece of puzzle.

Thanks!

node red infos

  • Node-RED v2.2.2 running as service within an
  • Ubuntu 20.04 lts LXC container on Proxmox
  • ffmpeg version 4.2.4-1ubuntu0.1 (installed via apt)
  • SoX v14.4.2 (installed via apt)
  • Anything else missing?

Edit: minor rewording

@mfreudenberg Michael welcome to the forum and that is a well presented bit of information!!

Do you have a small buffer of data you could provide for testing? if you can't attach it to a reply, try clicking on my icon and sending me a message with th file attached.

And also export your flow and attach itto a reply.

Hi, i'm currently on mobile. It seems i cannot download the flow. I'll post it as soon as i am at my laptop.

1 Like

Here you go:

My Flow as JSON

[
    {
        "id": "46213a2c45827dd0",
        "type": "tab",
        "label": "Matrix STT",
        "disabled": false,
        "info": "this flow contains nodes, that allows to transscribe speech messages to text in matrix rooms",
        "env": []
    },
    {
        "id": "703ac45acd9d36ac",
        "type": "function",
        "z": "46213a2c45827dd0",
        "name": "ReceiveVoiceMessage",
        "func": "",
        "outputs": 1,
        "noerr": 0,
        "initialize": "\nlet matrixClient = global.get(\"matrixClient['@nodered:mydomain.de']\");\nnode.warn(\"Hello There!\");\nlet initializedAt = new Date();\nmatrixClient.on('Room.timeline', async function(event, room, toStartOfTimeline, removed, data) {\n                \n                if (toStartOfTimeline) {\n                    return; // ignore paginated results\n                }\n                if (!event.getSender() || event.getSender() === node.userId) {\n                    return; // ignore our own messages\n                }\n                if (!data || !data.liveEvent) {\n                    return; // ignore old message (we only want live events)\n                }\n                if(initializedAt > event.getDate()) {\n                    return; // skip events that occurred before our client initialized\n                }\n\n                try {\n                    await matrixClient.decryptEventIfNeeded(event);\n                } catch (error) {\n                    node.error(error);\n                    return;\n                }\n\n                const isDmRoom = (room) => {\n                    // Find out if this is a direct message room.\n                    let isDM = !!room.getDMInviter();\n                    const allMembers = room.currentState.getMembers();\n                    if (!isDM && allMembers.length <= 2) {\n                        // if not a DM, but there are 2 users only\n                        // double check DM (needed because getDMInviter works only if you were invited, not if you invite)\n                        // hence why we check for each member\n                        if (allMembers.some((m) => m.getDMInviter())) {\n                            return true;\n                        }\n                    }\n                    return allMembers.length <= 2 && isDM;\n                };\n\n                let msg = {\n                    encrypted : event.isEncrypted(),\n                    redacted  : event.isRedacted(),\n                    content   : event.getContent(),\n                    type      : (event.getContent()['msgtype'] || event.getType()) || null,\n                    payload   : (event.getContent()['body'] || event.getContent()) || null,\n                    isDM      : isDmRoom(room),\n                    userId    : event.getSender(),\n                    topic     : event.getRoomId(),\n                    eventId   : event.getId(),\n                    event     : event\n                };\n                \n                // node.send(msg);\n                // return;\n                \n                // only look for m.audio\n                if(msg.type !== 'm.audio' || !msg.content.file.url) {\n                    // node.warn(\"I received something else: \" + msg.type + \": \" + msg.payload);\n                    // node.warn(\"The msg.content.url is \" + msg.content.url);\n                    // node.warn(\"The msg.content is \" + JSON.stringify(msg));\n                    node.warn(\"received something else but audio\");\n                    return;\n                }\n                \n                try {\n                    node.warn(\"trying to download audio from \" + msg.content.file.url);\n                    msg.url = matrixClient.mxcUrlToHttp(msg.content.file.url);\n                } catch (error) {\n                    node.error(error);\n                    return;\n                }\n    \n                node.send(msg);\n});",
        "finalize": "",
        "libs": [],
        "x": 180,
        "y": 100,
        "wires": [
            [
                "eec932d4460fcd79",
                "e39069f08097901d"
            ]
        ]
    },
    {
        "id": "4f539da8da81cc7f",
        "type": "http request",
        "z": "46213a2c45827dd0",
        "name": "HTTP-Request-Azure",
        "method": "POST",
        "ret": "txt",
        "paytoqs": "ignore",
        "url": "https://germanywestcentral.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=de-DE",
        "tls": "",
        "persist": false,
        "proxy": "",
        "authType": "",
        "senderr": false,
        "x": 740,
        "y": 400,
        "wires": [
            [
                "9f870d8a52ece901",
                "0729aa08becce716"
            ]
        ]
    },
    {
        "id": "73907480c98b76ca",
        "type": "debug",
        "z": "46213a2c45827dd0",
        "name": "HTTP body with headers",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 790,
        "y": 260,
        "wires": []
    },
    {
        "id": "9a55ac6fe26e16b7",
        "type": "debug",
        "z": "46213a2c45827dd0",
        "name": "FFMPEG Output",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 690,
        "y": 180,
        "wires": []
    },
    {
        "id": "9f870d8a52ece901",
        "type": "debug",
        "z": "46213a2c45827dd0",
        "name": "HTTP reponse with transscript",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 930,
        "y": 340,
        "wires": []
    },
    {
        "id": "3c92a5c1e3cc42f8",
        "type": "debug",
        "z": "46213a2c45827dd0",
        "name": "Matrx-Success",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "payload",
        "targetType": "msg",
        "statusVal": "",
        "statusType": "auto",
        "x": 1380,
        "y": 500,
        "wires": []
    },
    {
        "id": "1ba2b7e3e2ea8169",
        "type": "debug",
        "z": "46213a2c45827dd0",
        "name": "Matrx-Failure",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 1370,
        "y": 620,
        "wires": []
    },
    {
        "id": "0729aa08becce716",
        "type": "function",
        "z": "46213a2c45827dd0",
        "d": true,
        "name": "Format Matrix Message",
        "func": "msg.payload = msg.payload.text;\nmsg.topic = \nreturn msg;",
        "outputs": 1,
        "noerr": 2,
        "initialize": "",
        "finalize": "",
        "libs": [],
        "x": 970,
        "y": 480,
        "wires": [
            [
                "52f1bb72273c201d"
            ]
        ]
    },
    {
        "id": "091569939ebd355e",
        "type": "function",
        "z": "46213a2c45827dd0",
        "name": "Add HTTP Header",
        "func": "msg.headers = {};\nmsg.headers[\"Ocp-Apim-Subscription-Key\"] = \"INSERT-YOUR-SUBSCRIPTION-KEY-HERE\";\nmsg.headers[\"Content-Type\"] = \"audio/wav\";\nreturn msg;",
        "outputs": 1,
        "noerr": 0,
        "initialize": "",
        "finalize": "",
        "libs": [],
        "x": 590,
        "y": 320,
        "wires": [
            [
                "73907480c98b76ca",
                "4f539da8da81cc7f"
            ]
        ]
    },
    {
        "id": "52f1bb72273c201d",
        "type": "matrix-send-message",
        "z": "46213a2c45827dd0",
        "d": true,
        "name": "Respond with Text",
        "server": "55dcf44d3591365d",
        "roomId": "",
        "message": "",
        "messageType": "m.notice",
        "messageFormat": "",
        "replaceMessage": true,
        "x": 1130,
        "y": 560,
        "wires": [
            [
                "3c92a5c1e3cc42f8"
            ],
            [
                "1ba2b7e3e2ea8169"
            ]
        ]
    },
    {
        "id": "eec932d4460fcd79",
        "type": "debug",
        "z": "46213a2c45827dd0",
        "name": "Received message obj",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 440,
        "y": 80,
        "wires": []
    },
    {
        "id": "e39069f08097901d",
        "type": "http request",
        "z": "46213a2c45827dd0",
        "name": "Download Message",
        "method": "GET",
        "ret": "bin",
        "paytoqs": "ignore",
        "url": "",
        "tls": "",
        "persist": false,
        "proxy": "",
        "authType": "",
        "senderr": false,
        "x": 310,
        "y": 160,
        "wires": [
            [
                "86ccf98bef9c7946",
                "73a78d9087911276"
            ]
        ]
    },
    {
        "id": "86ccf98bef9c7946",
        "type": "debug",
        "z": "46213a2c45827dd0",
        "name": "Downloaded Message",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 680,
        "y": 120,
        "wires": []
    },
    {
        "id": "73a78d9087911276",
        "type": "change",
        "z": "46213a2c45827dd0",
        "name": "Set format OGG",
        "rules": [
            {
                "t": "set",
                "p": "format",
                "pt": "msg",
                "to": "ogg",
                "tot": "str"
            }
        ],
        "action": "",
        "property": "",
        "from": "",
        "to": "",
        "reg": false,
        "x": 340,
        "y": 220,
        "wires": [
            [
                "f28272dd69304585"
            ]
        ]
    },
    {
        "id": "f28272dd69304585",
        "type": "sox-convert",
        "z": "46213a2c45827dd0",
        "name": "",
        "conversionType": "wav",
        "outputToFile": "buffer",
        "manualPath": "",
        "wavMore": false,
        "wavByteOrder": "-L",
        "wavEncoding": "signed-integer",
        "wavChannels": 1,
        "wavRate": 16000,
        "wavBits": 16,
        "flacMore": false,
        "flacCompression": 8,
        "flacChannels": 1,
        "flacRate": 16000,
        "flacBits": 16,
        "mp3More": false,
        "mp3Channels": 2,
        "mp3Rate": 44100,
        "mp3BitRate": 128,
        "oggMore": false,
        "oggCompression": 3,
        "oggChannels": 2,
        "oggRate": 44100,
        "debugOutput": false,
        "x": 370,
        "y": 320,
        "wires": [
            [
                "091569939ebd355e"
            ],
            [
                "43ce7e98138c6894"
            ]
        ]
    },
    {
        "id": "43ce7e98138c6894",
        "type": "debug",
        "z": "46213a2c45827dd0",
        "name": "SoX response",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 500,
        "y": 420,
        "wires": []
    },
    {
        "id": "55dcf44d3591365d",
        "type": "matrix-server-config",
        "name": "nodered",
        "autoAcceptRoomInvites": true,
        "enableE2ee": true,
        "global": true
    }
]

The Buffer aka msg.payload, that i took from the "Downloaded Message" debug node.

[58,237,57,232,3,39,110,23,61,247,226,185,206,48,17,108,18,96,227,53,100,56,117,56,214,151,152,189,70,250,161,85,19,163,189,233,102,137,15,47,120,96,90,25,183,137,145,132,80,60,217,99,98,81,134,254,228,30,208,54,230,169,10,44,84,98,253,240,210,179,109,22,186,93,249,31,60,85,236,26,83,190,175,233,81,181,197,155,49,126,27,196,112,174,219,228,232,59,93,44,247,84,223,118,240,117,125,247,79,33,147,79,31,246,75,4,116,44,215,16,180,115,64,110,73,137,57,179,132,105,118,243,99,213,109,90,64,28,187,84,27,201,181,40,227,78,39,135,18,58,220,50,118,3,84,151,193,199,179,75,165,31,243,242,172,168,224,121,136,228,96,229,133,204,35,169,110,0,132,237,134,6,93,155,117,175,12,135,24,66,213,69,142,163,88,238,108,58,106,116,241,50,158,241,67,136,144,204,180,105,80,249,146,214,31,183,47,195,68,6,4,83,122,134,23,18,50,35,221,128,60,75,121,112,131,86,178,197,223,132,189,201,209,122,59,141,232,180,139,49,124,174,24,151,220,49,241,119,82,92,165,250,39,46,117,177,151,179,92,17,105,45,106,34,232,128,77,152,100,71,46,48,234,219,187,125,168,250,215,73,12,8,142,69,67,240,102,120,25,124,172,60,68,91,90,189,110,2,163,205,58,120,84,247,210,36,171,15,63,103,35,49,7,10,180,72,61,155,254,95,190,145,186,253,103,238,94,218,61,32,138,58,8,117,49,237,17,199,164,252,34,108,55,119,205,95,200,81,100,35,42,212,157,124,15,26,59,95,255,32,247,206,216,55,115,211,186,68,216,43,245,102,172,127,237,23,171,104,212,71,59,7,1,31,253,105,163,24,152,42,131,80,149,45,0,23,182,221,151,133,179,173,39,158,125,241,7,254,187,193,31,88,96,214,244,162,81,168,137,224,128,72,112,216,248,111,193,99,251,137,36,46,229,218,101,17,18,191,185,41,154,65,102,174,22,17,98,194,99,210,30,30,118,149,181,231,234,93,205,163,129,192,218,156,219,66,165,143,35,74,66,155,255,254,179,247,128,149,194,174,236,60,98,103,26,42,153,93,246,121,159,220,77,57,50,181,154,81,9,69,131,54,59,60,5,80,68,83,190,123,135,51,198,13,91,72,176,133,51,123,25,35,35,197,248,102,54,68,51,202,7,78,239,168,83,249,255,196,85,212,64,147,13,97,200,137,130,237,158,157,242,129,112,12,10,46,195,163,82,136,51,113,252,128,65,81,144,246,167,7,240,188,136,21,182,142,99,116,204,176,79,123,234,131,60,181,227,128,207,68,150,129,23,27,224,185,177,31,188,178,31,126,253,139,227,205,99,199,180,216,68,31,214,95,185,235,139,115,45,135,71,123,41,103,107,249,163,171,121,86,199,214,34,208,25,242,244,183,238,2,43,51,129,1,239,133,39,199,169,102,79,39,246,66,236,253,187,143,27,91,200,56,122,8,152,23,147,74,31,108,90,4,233,26,99,8,166,93,145,116,231,44,185,244,225,209,205,19,243,232,56,184,35,123,155,54,220,225,228,174,34,18,152,166,196,160,156,156,131,130,71,5,170,115,155,20,115,15,149,218,25,144,134,34,83,162,25,151,99,41,141,116,146,210,30,184,166,144,63,217,250,244,10,167,88,210,190,134,109,227,146,9,119,142,99,103,134,198,195,83,0,222,139,172,135,86,231,156,92,84,3,34,112,40,143,59,119,173,9,192,78,249,189,126,201,186,242,105,95,246,46,100,235,3,144,43,41,141,91,193,149,60,178,139,193,25,22,169,160,50,85,247,186,7,126,65,93,131,19,97,253,194,154,62,111,61,199,178,223,90,244,156,172,92,106,239,219,134,169,166,235,244,164,156,94,172,157,40,157,40,50,108,16,166,30,52,65,229,8,98,156,70,142,95,208,7,37,220,161,77,183,215,40,117,169,30,128,0,29,102,89,119,144,123,232,218,201,54,121,15,231,193,90,226,221,35,224,250,33,209,0,2,58,192,241,22,90,78,104,166,13,58,109,106,152,21,191,250,12,116,88,12,226,194,254,115,175,169,153,201,90,105,222,155,145,56,110,144,119,252,228,47,111,146,81,12,221,65,172,177,97,227,250,45,5,127,47,37,190,121,138,12,201,54,182,3,38,203,242,171,93,202,45,117,137,82,210,11,85,230,35,47,228,202,216,79,86,14,107,125]

BR;

So if I add an inject node with your buffer it fails. From what I see on line (I'm no audio expert) 'ogg’ is a container and the buffer you provided is doesn't seem to be in the correct format.

Hi @zenofmud,

Thats what i suspected. What bugs me is that, when i convert the ogg via cli-ffmpeg it works. In addition to that, ffmpeg shows me that the content is opus (i have a sample cli output in my first post).
My assumption was, that if i can convert the ogg/opus via ffmpeg-cli it should work also with a node-red node.
Something is odd.
Maybe i need to install the opus codec on the host. I'll check that.

Hi,

just a small update. I did some testing, and i suspect that the downloaded buffer is still encrypted. This article on Wikipedia states, that every ogg container starts with the magic bytes "OggS". This is not the case, when i download the voice message via node-red block. But it is the case, when i download the voice message using my matrix client. I guess, there is some code missing for decrypting the attachment. I have added my findings to this bug request :grinning:

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.