Technical details about decoding rtsp streams via node-red-contrib-ffmpeg-spawn

Hi folks,

Trying very hard to find some free time for Node-RED again...

Some time ago @kevinGodell has done a tremendous effort to try to teach me the basics decoding rtsp streams in another discussion:

I really appreciate all his hard labour!!
But now I am struggling to get a hold on the technical details.

My first problem is about the output chunks. The output pipes of the child process have OS limitations, so as a result the output data is splitted into chunks (e.g. 65K size on a Raspberry Pi). Two questions about this:

  1. If the output are jpeg images, then I need to use a node-red-contrib-pipe2jpeg node to join those chunks again to create a complete jpeg image. I see here in the pipe2jpeg dependency that the EOF can be anywhere inside the chunks. Which means that those chunks have to split into two chunks again: the first chunk part belongs to the previous jpeg while the second chunk part belongs to the current jpeg. Which will result in a lot of searching through the buffers and doing repetitive buffer concatenations, i.e. a lot of cpu and ram usage.

    But when I look at this code from the rtsp-ffmpeg library, I see that it simply detects the EOF at the end of the chunk. So no searching of the EOF inside the chunks required and no buffer concatenations. Which requires much less cpu and ram.

    When I use that code snippet from the rtsp-ffmpeg library in my flow, it seems to be working very well. I.e. I see fluid images arriving in my flow.

    So now I am wondering if I am just lucky that my EOF's are always at the end of my chunks, and not somewhere inside my chunks? I really have no clue...

  2. As explained above the node-red-contrib-pipe2jpeg does a very well job to join chunks of data into complete jpeg images. However when the output of my spawn node will contain h.264 segements (instead of images), then those segments are also splitted by the OS limitations into chunks. Not sure at the moment how I can combine those chunks to get the original complete segments again. When I send those chunks immediately to the mp4 frag node, the separate chunks seem to be no problem for that node. Again no clue how this works...

Hopefully somebody can answer my question, because my head is exploding :exploding_head:

Warning: as soon as these chunk related questions are answered, I have 'some' other technical questions about the rtsp streams :wink:

Thanks!!
Bart

If you keep the frames per second output of jpegs from ffmpeg at a reasonably low amount, the end of the final chunk should be ff d9. If you increase the fps, ffmpeg will pack more data before flushing its buffers, which would cause jpegs to be spread across chunks in unexpected ways.

For example, on my mac using ffmpeg with a low fps, a jpeg that is 20k with arrive in 3 chunks. The first 2 are at the mac's system limit of 8192 bytes, and then the 3rd chunk will be about 4k. The following chunk would be the first piece of the next 20k jpeg sized at 8192. If you increase the fps, then ffmpeg usually will pack the buffers and then the jpegs can begin and end wherever. As we know, people usually try to have unnecessarily high fps for such things.

The mp4 is a little easier to determine its size compared to jpeg. Each mp4 box indicates its byte size, which makes it easier to figure out where it ends. For jpeg, I don't think there is any header info to indicate its byte size, so we have the tedious eoi(end of image) search.

For pipe2jpeg or mp4frag, the Buffer.concat only occurs after the piece is complete. Any buffer slicing is actually keeping a reference to the original buffer without using new memory space. buffer.slice does not increase ram usage as Buffer.concat would.

Side note, I just read that buffer.slice is now deprecated as of node.js v17 and that I should use buffer.subarray. Buffer | Node.js v18.2.0 Documentation.

All the info that I needed. As usual... Thanks!!

Just out of curiosity, where in your mp4 frag code do you join the chunks?

There are multiple locations where buffer concating may occur, depending on the size and how the chunks were split and if it is the initialization fragment or a media segment. Here is 1 example mp4frag/index.js at master · kevinGodell/mp4frag · GitHub

1 Like

Kevin,

I have still some other rtsp related questions. Did quite some googling, but I find it very hard to get some decent understandable information. Perhaps you might have some more info about some of the below questions... Don't hesitate to answer "don't know either" :wink:

  1. Do you know whether it is easy to do digest authentication instead of basic authentication?

  2. You can change the resolution via the scaling parameter -s 100x200, but it can also be done via a filter -vf scale=320:240. Do you know which is the most optimal way of working?

  3. Suppose you change the resolution, I assume the rtsp stream keeps transferring data at the orginal resolution (as setup in the camera for that main/sub stream). I.e. that the resolution in only changed after receving in ffmpeg. So no adaptive streaming?

  4. I have installed the full version of ffmpeg. But do you know how a minimal version can be installed, which contains enough libraries to do rtsp decoding? And do I need to build a version with hardware acceleration for an rtsp stream?

  5. There are a couple of buffers involved (socket buffer, packet reorder buffer, ...) which all need to be big enough to avoid artificats, but should be small enough to avoid latency issues. Do you have ever seen some kind of drawing diagram where those buffers are visualized? Because the ffmpeg flow is not really clear to me.

  6. Am I correct that these are the codecs that are most being used: libx264 / libx265 for video, and libfdk_aac / libmp3lame for audio?

  7. Besides udp, ffmpeg also supports multicast udp for rtsp. Do you know whether some cams support the latter transport protocol?

Don't know. I only connect ffmpeg to the cams using the url such as rtsp://admin:password1234@192.168.1.32:554/cam/realmonitor?channel=1&subtype=0

Don't know. I wouldn't be surprised if both of those settings mapped to the same internal scaling lib.

I am not sure if you mean changing the resolution from the camera's settings gui. I did that just now (changing the resolution from the camera's gui web page) and noticed that ffmpeg drops the connection.

You would have to setup a build environment and set all of the correct build flags for ffmpeg to have exactly what you need. Firstly, if the goal is to be minimal, you would start off with disabling everything. Then, slowly add new build flags to add things that you need and keep trying to use the ffmpeg until it does what you want. Much trial and error. android - Compiling FFmpeg without ALMOST everything - Stack Overflow

I don't know if hardware acceleration would do anything for rtsp transport, but if you are going to be decoding or encoding any of its h.264/h.265 video content, then yes, hardware acceleration could help(with limitations). But you are going to have to find all of these (possibly system specific) libraries and build them so that they can be included with ffmpeg.

No, I have not. I had experimented with those settings and never noticed any differences. At one time, I was on the hunt for the most optimal settings for low latency streaming and met with many dead ends. Accepting a little latency for the sake of low cpu usage was a good trade off for me.

Sounds right. h264 encoded mp4 works in most browsers, so that makes it a good choice.

I have never tried multicast udp, nor do i know what it is. I can say that i use -rtsp_transport tcp, otherwise I get visual artifacts from packets arriving out of order. I suppose that the other settings for reorder buffers may help with that, but if the packet arrived late, then it probably should be discarded if the goal is to view live video instead of just recording it. Just tried udp_multicast with my amcrest cam and it was just as bad as i remember with video distortion, etc. In the past, I did have a low end cam that did not support tcp, but it has been long retired in the trash.

Final thoughts (and I know I am repeating myself).

If you can avoid doing any decoding/encoding, then you are in good shape. If you must generate jpegs, etc. from a h.264 encoded video source, then try to use hardware acceleration( but will have limits on the input video resolution and also be limited to how much gpu memory is available ).

In my opinion, it is best to use a cam's main and sub streams. All video sizing should be done from the camera. So, if you must generate a jpeg(or any other format) from h.264, then use the sub stream which with take less effort to decode/encode. 12 of my cams support a sub stream and I take advantage of them. If I need to view a cam on my mobile phone, then relaying the sub stream is so much faster and good enough for the small screen.

Priceless info again. Way better as what I found on the www...

Yes of course. How stupid of me. When I don't do decoding or re-encoding, of course there is no need for hardware acceleration.

Ok, at this moment things are getting clear about RTSP. So I need to get again on track with your mp4frag nodes. Did some (succesful) experiments with them in the past. However when I feed now my h.264 chunks (from ffmpeg) into the node-red-contrib-mp4frag node, the spinner of the node-red-contrib-ui-mp4frag node keeps spinning. I see this in my browser console:

When I click on such an error message, I see stuff like this:

image

When I remove the last part of the URL, I can see some information:

image

I 'think' that my chunks are ok, because I can view the I-frames without problems in an image-output node. So I assume that I have done something wrong with the mp4 frag nodes. This is my setup:

Do you have any tips of things I could check?

What I don't really get:

It still doesn't ring a bell to me what I am doing wrong...

Interesting thread :slight_smile:

Just to add, h264_omx, when you need to decode/encode and use the GPU, at least on my RPi it makes a huge difference on the CPU load compared to use libx264

Sorry for the code example below, it is in Python and taken from my YOLO analyzer I use for object detection in the incoming images from my cameras. But it shows how I check the data; if it is a complete image or if it is just a chunk and how I then build the complete image of those. It (seems to) be working fine for more than a year, it provides complete images that are then being analyzed

        if(len(msg.payload)>1000):
            img = msg.payload
            cam = msg.topic.split('/')[1]
            if images.qsize() <= 300:
                start = img[0:2]
                end = img[-2:]
                
                if start == b'\xff\xd8' and end == b'\xff\xd9':
                    # we have a complete jpeg
                    # convert string of image data to uint8
                    nparr = np.fromstring(img, np.uint8)
                    # decode image
                    decodeImg(nparr, cam)

                if start == b'\xff\xd8' and end != b'\xff\xd9':
                    print ("we have a first chunk")
                    chnks[cam] = img
                    
                if start != b'\xff\xd8' and end != b'\xff\xd9':
                    print ("we have a middle chunk")
                    chnks[cam] += img
                    
                if start != b'\xff\xd8' and end == b'\xff\xd9':
                    print ("we have a final chunk")
                    chnks[cam] += img
                    if(chnks[cam][0:2] == b'\xff\xd8' and chnks[cam][-2:] == b'\xff\xd9'):
                        # we have a complete jpeg, convert collected chunks of image data to uint8
                        nparr = np.fromstring(chnks[cam], np.uint8)
                        # decode image
                        decodeImg(nparr, cam)
                    del chnks[cam]

1 Like

Morning Walter,

Thanks for the sharing! I will put your gpu codec on my list, if I ever need it.

I had issues last night with the libfdk_aac decoder. Seems that it is not included in the standard ffmpeg pre-builds, due to GPL license issues. Since I don't want to start building ffmpeg myself at this moment, I have replaced it by the free aac decoder. Not sure if that is worse...

That python code snippet is very interesting!!!

  • So it first looks at the last characters of a chunk, to see whether a complete image has arrived.
  • And only if that isn't the case it will start searching for an EOF inside the chunk. Which is much faster of course, compared to always searching through the entire chunk.

I assume it will only malfunction when the chunk both ends with an EOL and contains a second EOL. But based on your positive experience, I assume that rarely happens...

@kevinGodell: after a 'quick' look at your pipe2jpeg code, I 'think' your library always searches through the entire chunk (using indexof). Would be nice if you would consider adding a such a fast search feature! And if you don't want to include it as a general way of working, an optional feature (that can be explicit activated via a parameter) would be sufficient for me. Thanks!!!

I still don't get the mp4 frags working :scream:. I assume I am doing something completely wrong. But due to my lack of knowledge about the topic, I am looking around like a chicken without head :roll_eyes:

Well, first I check if I get the first and the last characters correctly, means it is a complete image, if so nothing else needed for that image. Otherwise I keep the chunks until a complete image has been "collected"

1 Like

Nailed it. Now it works :champagne:

Found an older advise from our ffmpeg wizard Kevin, to get info about my rtsp stream using ffprobe. This was the output of my probe:

Metadata:
title : Media Presentation
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 800x600 [SAR 1:1 DAR 4:3], 29.92 tbr, 90k tbn
Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s

I suddenly realized that I converted the audio to AAC, but for video I used the 'clone codec' option. As a result I stored mjpeg chunks into mp4 containers, because I thought that my cam did send an H.264 stream. Not appearantly...

After I instructed ffmpeg to convert the video to H.264 (using ... -c:v libx264...) everything seems to work fine in my dashboard. Will have a look tonight whether my old cam can send h.264, to avoid having to do this conversion in my Node-RED flow.

@kevinGodell: your mp4frag nodes are a real piece of art. Must have been an awfull lot of development work to get it running. If you ever have some ideas to troubleshoot root causes of issues (like the one I had) more easily, please share it!

1 Like

Did you check your cpu load during this?

I think I just used "-c:a copy" for the audio, maybe works for you as well?

https://trac.ffmpeg.org/wiki/Encode/HighQualityAudio

Not sure about that. Will test tonight.

Not yet. At this moment I am just trying to get something working. Performance tuning will be needed afterwards indeed...

Now I have some other weird effect:

  • Last night I saw that I have a large delay of about 10 - 12 seconds of my video via the mp4 frag nodes. On the other hand my I-frames show immediately the correct images. So I assume there is some kind of delay or buffering inside the mp4frag nodes?
  • This morning when I started my portable, the mp4frag ui node started automatically playing video again. But it started showing video from last night. So it must be buffered somewhere. Weird to see myself waving with my hands towards the cam. My first reaction was: what a weirdo :joy:

Anybody any idea about what could cause this?

I’m afraid it genetics. Nothing we can do about it :grin:

2 Likes