High CPU usage when extracting I-frames from an RTSP stream

Hi folks,

Finally I did buy some decent camera's...
Handling the RTSP stream - from a cam that sends H.264 for video and AAC for audio - works like a charm, since no re-encoding is required. Almost no cpu usage!

But my troubles started when trying to extract the I-frames from my stream, for image processing. Our ffmpeg guru @kevinGodell already described here very clearly that HIGH CPU usage is normal for encoding jpegs (since no hardware acceleration is applicable). But I am quite flabbergasted about the amount of cpu being used for only 1 image per second on a Raspberry Pi 4:

image

Very weird because I-frames are already complete image frames, so I am very surprised that it still consumes such a large amount of cpu... Therefore I am still hoping that I do something wrong with my ffmpeg parameters, and that things can be improved. This is my command:

[
    "-loglevel",
    "+level+fatal",
    "-nostats",
    "-rtsp_transport",
    "tcp",
    "-i",
    "rtsp://my_cams_rtsp_stream",
    "-f",
    "mp4",
    "-c:v",
    "copy",
    "-c:a",
    "copy",
    "-movflags",
    "+frag_keyframe+empty_moov+default_base_moof",
    "pipe:1",
    "-progress",
    "pipe:3",
    "-f",
    "image2pipe",
    "-vf",
    "select='eq(pict_type,PICT_TYPE_I)'",
    "-vsync",
    "vfr",
    "pipe:4"
]

But I could not found decent information about image2pipe. If I understand it correctly, image2pipe will extract only the I-frames from the stream, based on the PICT_TYPE_I argument.

Not sure where all the cpu is being eaten:

  1. Does it take lots of cpu time to find the I-frame in the stream for some reason?

  2. Does it take lots of cpu time to encode the I-frame to a jpeg image?

    To determine this, I tried this tip from StackOverflow, to use uncompressed BMP images. Such BMP's are not useful for me, but it is a way to do a root cause analysis. However when I try to add -c:v bmp" or "-c:v rawvideo", I had expected that the cpu usage would drop because no jpeg encoding would be required. But it doesn't solve anything?

  3. Does it perhaps select too much frames, so a lot of extra unnecessary encoding is required?

    I don't think this is the problem because my camera has a framerate of 1 frame/sec, and when I add a node-red-contrib-msg-speed node at the output then that one also indicates about 1 second per minut.

I have run completely out of ideas. If anybody has tips, all help is appreciated!!!!

P.S. I am aware that I could lower the cpu by e.g. adjusting the frame rate, resolution, ... on the ip camera web interface. But I would like to understand why the performance is so horrible with my current settings.

Thanks!!
Bart

I tried to play with the quality (not in the ip cam but in ffmpeg), but the results make no sense to me...
The lower your quality factor in ffmpeg, the better is the quality (range 2 to 32, fractions allowed).

So I added "-q:v 32" as an extra parameter to my image2pipe, but the cpu usage doesn't change. Whether I use 2, 16, 32 the cpu keeps being high.
I will be doing something ridiculous wrong, because the cpu for jpeg decoding should be much lower in case of a low quality image.

Or perhaps I should put that parameter before my image2pipe?

Hopefully some folks can join here, otherwise this will become my N-th contribution that will never see the npm daylight ...

Here is a demo flow that takes an image once every 4 seconds. The first one is putting a bit more load to the cpu since it is using image2pipe to pipe the image data to an output. The second one is just saving to file directly and this seems not to cause any high load

It seems also that saving the image as .png causes lower cpu load than saving it as .jpg

In the image from my desktop below, I am running the first example, piping the image data

[
    {
        "id": "333e1b6ed681368c",
        "type": "ffmpeg-spawn",
        "z": "0eb3c44ad5147396",
        "name": "",
        "outputs": 2,
        "cmdPath": "ffmpeg",
        "cmdArgs": "[\"-i\",\"rtsp://rtsp.stream/pattern\",\"-r\",\"0.25\",\"-f\",\"image2pipe\",\"pipe:1\"]",
        "cmdOutputs": 1,
        "killSignal": "SIGTERM",
        "x": 540,
        "y": 1410,
        "wires": [
            [
                "c8def1c6a8ddd793"
            ],
            [
                "abb5502f624a81d3"
            ]
        ]
    },
    {
        "id": "c8def1c6a8ddd793",
        "type": "debug",
        "z": "0eb3c44ad5147396",
        "name": "debug 17",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 790,
        "y": 1380,
        "wires": []
    },
    {
        "id": "c3fabf8f72779d2a",
        "type": "inject",
        "z": "0eb3c44ad5147396",
        "name": "Start",
        "props": [
            {
                "p": "action",
                "v": "{\"command\":\"start\"}",
                "vt": "json"
            },
            {
                "p": "topic",
                "vt": "str"
            }
        ],
        "repeat": "",
        "crontab": "",
        "once": false,
        "onceDelay": 0.1,
        "topic": "",
        "x": 300,
        "y": 1370,
        "wires": [
            [
                "333e1b6ed681368c"
            ]
        ]
    },
    {
        "id": "156fc1f6122ae9a6",
        "type": "inject",
        "z": "0eb3c44ad5147396",
        "name": "Stop",
        "props": [
            {
                "p": "action",
                "v": "{\"command\":\"stop\"}",
                "vt": "json"
            },
            {
                "p": "topic",
                "vt": "str"
            }
        ],
        "repeat": "",
        "crontab": "",
        "once": false,
        "onceDelay": 0.1,
        "topic": "",
        "x": 300,
        "y": 1450,
        "wires": [
            [
                "333e1b6ed681368c"
            ]
        ]
    },
    {
        "id": "abb5502f624a81d3",
        "type": "image",
        "z": "0eb3c44ad5147396",
        "name": "",
        "width": "150",
        "data": "payload",
        "dataType": "msg",
        "thumbnail": false,
        "active": true,
        "pass": false,
        "outputs": 0,
        "x": 810,
        "y": 1450,
        "wires": []
    },
    {
        "id": "1b6e412c52c08817",
        "type": "ffmpeg-spawn",
        "z": "0eb3c44ad5147396",
        "name": "",
        "outputs": 2,
        "cmdPath": "ffmpeg",
        "cmdArgs": "[\"-i\",\"rtsp://rtsp.stream/pattern\",\"-r\",\"0.25\",\"output_%04d.png\"]",
        "cmdOutputs": 1,
        "killSignal": "SIGTERM",
        "x": 540,
        "y": 1690,
        "wires": [
            [
                "9fb981abee8cf839"
            ],
            [
                "1f8a82bcc624f08f"
            ]
        ]
    },
    {
        "id": "9fb981abee8cf839",
        "type": "debug",
        "z": "0eb3c44ad5147396",
        "name": "debug 18",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 790,
        "y": 1660,
        "wires": []
    },
    {
        "id": "738c11e7cb9f4894",
        "type": "inject",
        "z": "0eb3c44ad5147396",
        "name": "Start",
        "props": [
            {
                "p": "action",
                "v": "{\"command\":\"start\"}",
                "vt": "json"
            },
            {
                "p": "topic",
                "vt": "str"
            }
        ],
        "repeat": "",
        "crontab": "",
        "once": false,
        "onceDelay": 0.1,
        "topic": "",
        "x": 300,
        "y": 1650,
        "wires": [
            [
                "1b6e412c52c08817"
            ]
        ]
    },
    {
        "id": "b02490128127d690",
        "type": "inject",
        "z": "0eb3c44ad5147396",
        "name": "Stop",
        "props": [
            {
                "p": "action",
                "v": "{\"command\":\"stop\"}",
                "vt": "json"
            },
            {
                "p": "topic",
                "vt": "str"
            }
        ],
        "repeat": "",
        "crontab": "",
        "once": false,
        "onceDelay": 0.1,
        "topic": "",
        "x": 300,
        "y": 1730,
        "wires": [
            [
                "1b6e412c52c08817"
            ]
        ]
    },
    {
        "id": "1f8a82bcc624f08f",
        "type": "image",
        "z": "0eb3c44ad5147396",
        "name": "",
        "width": "150",
        "data": "payload",
        "dataType": "msg",
        "thumbnail": false,
        "active": true,
        "pass": false,
        "outputs": 0,
        "x": 810,
        "y": 1730,
        "wires": []
    }
]

1 Like

Hi Walter, thanks (again) for joining!!

I will have a look tonight at your flow. Now at work...
A quick question: I see that you write to png files. Would be nice if you could share what happens if you write to jpeg files? Because until now I thought that the jpeg encoding was consuming most cpu. But png is also (losless) compression, but not sure about the cpu comparision between both formats. If you can write .jpg files with less cpu, then I assume the image2pipe module is eating so much cpu?

My brains cannot process the fact that piping data in-memory would be more worse for the system, compared to using IO to store the same information into a file :exploding_head:

Saving as .jpg causes the cpu load to be higher then saving as .png, at least on my RPi4
Here we have 2 instances of ffmpeg running in parallel, the one with higher cpu load is saving as .jpg, the other as .png

Best regards, Walter

EDIT: The size of the .png is slightly larger than the .jpg (23k compared to 18k), means that ffmpeg is also doing some compression, this should explain why more cpu power is required

1 Like

Hey Walter,

Your flow was very illuminating. Indeed writing to png files uses the least cpu.

But when I replace the online test rtsp url (320x240 - 30 fps) by the rtsp stream of my cam (3840x2160 - 15 fps), the results are completely different: now piping to image2pipe uses about the same cpu, compared to writing to jpeg files (about 35% cpu usage). However writing to png files now uses much more cpu (about 75%), until almost 100% of the cpu is being used. Very weird.

Some other tests (on my camera rtsp stream):

  1. When I skip the non-I-frames (to keep all the I-frames i.e. the keyframes) like this:

    -skip_frame nokey -i <my_cam_rtsp_url> -r 1 -f image2pipe pipe:1
    

    Then this seems to be the less-worse solution:
    image

  2. Because when I only keep the I-frames using a filter:

    -i <my_cam_rtsp_url> -vf select='eq(pict_type,PICT_TYPE_I)' -r 1 -f" image2pipe pipe:1
    

    Then the result is much worse:

    image

    Not sure why there is a difference with the sip_frame nokey solution from the previous step.

  3. When I move the filter after the image2pipe:

     -i <my_cam_rtsp_url> -r 1 -f image2pipe -vf select='eq(pict_type,PICT_TYPE_I)' pipe:1
    

    Then this seem to make not much of a difference, compared to the previous test:
    image

    Again not really clear to me.

Not really sure about the next steps...

It's a huge difference in the amount of data to handle between the sources, 320x240 compared to 3840x2160

Have you tried to reduce the resolution on images sent from your camera, just to check how this influencies the cpu load?

No that is another test on my todo list. But trying to find out how the ffmpeg parameters is very time consuming :face_with_diagonal_mouth:

Last night I was struggling with something else. My camera sends 15 frames per second and 1 I-frame (= keyframe) per 15 frames:

image

Which means it sends 1 I-frame (= keyframe) per second.
So in my ffmpeg arguments:

-skip_frame nokey -i <my_cam_rtsp_url> -r 1 -f image2pipe pipe:1

The -r 1 (to have 1 frame per second) is quite useless, since this the -skip_frame nokey only passes through 1 I-frame per second. However when I removed the -r 1, the cpu usage changed from 1 spike per second to a continious load:

image

Moreover when using a node-red-contrib-msg-speed node, the decoded number of jpeg's per second increased:

Then I came across this discussion which contained the answer.
Summarized: image2pipe is a constant frame-rate muxer so it will attempt to maintain stream frame rate when number of supplied frames is less than the frame rate. So when you pass only 1 I-frame per second to image2pipe, then it will create extra new frames (from that I-frame) to have its original framerate again.

Using -vsync 0 you can tell image2pipe to forget about that:

-skip_frame nokey -i <my_cam_rtsp_url> -vsync 0 -f image2pipe pipe:1

Then indeed the speed node indicates that I have again 1 frame per second (i.e. the I-frame):

And the continious cpu load dispappears again:

image

This -vsync 0 looks more interesting to me, compared to -r 1. Because when you change your camera settings (frames per second or frame interval) then you don't want to change your ffmpeg command? But not sure yet...

Note that the vsync parameter is obsolete, so I think we will need to use fps_mode parameter instead (see ffmpeg documentation).

Still hoping that some extra folks can join, otherwise I won't be able to finish my rtsp node...

FFmpeg is a complete black box to me. Seems that I get a different CPU usage, based on the location of the -vsync 0 argument in my command:

image

Very weird...

Possibly in the former the vsync is being applied before the images are converted to a stream, and in the latter, after conversion. Not that convincing an idea, but all I can come up with. Do they appear to give identical results?

Hi Colin,
Thanks for joining this technical discussion!

Yes I think you are right.
Although the FFmpeg documentation only mentions that options are applied to the next specified input or output (where both input and output can be a file, pipe or stream). Something like this:

ffmpeg [global options] [input options] -i input [output1 options] output1 [output2 options] output2

Don't see anything being mentioned of e.g. the output options sequence being important. But indeed it smells like it is important anyway.

BTW the above graphs where generated with my cpu node being triggered once a second, which is not quite accurate when there is also 1 image per second being decoded. So I now call the cpu node 10 times per second, and I get a more detailed view of the CPU spikes:

image

This is even worse as what I had expected :scream:

Moreover when I tell image2pipe to extract (raw) bmp images - instead of encoding compressed jpeg images:

... -c:v bmp -f image2pipe ...

Then the CPU usage increases even more:

image

Don't understand this anymore, because there is even no compression required anymore.

I am having large doubts at the moment that I will ever be able to extract I-frames from multiple RTSP streams on my Raspberry PI 4 ...

Extracting 1 frame per second, checking overall CPU load 10 times/second. I would say the "average" overall cpu load on my RPi4 is around some 10%. Spikes I think is just normal and should be accepted, the importance is where you have the "main body" of the load

I guess the difference could be the sources and resolution we are using and maybe the ffmpeg params. I use these simple ones in the ffmpeg-spawn node:

[
    "-i",
    "rtsp://rtsp.stream/pattern",
    "-r",
    "0.99",
    "-f",
    "image2pipe",
    "pipe:1"
]

It's hard to say where the problem is if we are not looking at the individual process' cpu load, but here is a list of things that should be considered...

  • Always better to use the lower resolution/fps stream from your cam when decoding/encoding.
  • If we are piping data into the node-red process, then more memory should be given to node.js.
  • I have had better performance on the pi by turning off the swap.

Having that before the input makes sense that it should use less cpu vs having the -vf select='eq(pict_type,PICT_TYPE_I)' applied to the output. I would think that it is affecting the decoding of the rtsp stream. Chunks of input video can be discarded before being decoded since ffmpeg can detect if the input chunk has a keyframe. Only applying the vf to the output means that all of the input video will be unnecessarily decoded before attempting the filtering.

Doing that means that you are piping in images that are sized 3840x2160x3 (approximately), which is way too big. The node.js process has to handle all of that and constantly allocate the memory for it. Remember, the piped data is broken up into about 65k chunks for the pi, meaning that the single image arrives in nearly 400 chunks (my math may be off). So, while you are only doing that 1 time per second, you are causing the node.js process to do quite a bit more(400x). That would also help to explain why having ffmpeg write directly to disk uses much less system load. As an alternative, you could create a temp dir and have ffmpeg write there. Then, you could use a node that monitors file creation and read that back in to node-red as needed. I can't predict if that will be better, but it's possible.

Back to the idea of using the lower res input for decoding/encoding...
You may be able to take advantage of hardware accelerated decoding if the input resolution is low enough. I use h264_mmal to decode some of my rtsp streams and the cpu load is quite low.

Side note, I have not been doing any dev lately due to time constraints and my 7 year old macbook was non functioning for about 2 months.

1 Like

Anyway great to see you back again!!

1 Like

Yes indeed. Very nice that you could find some time to join this discussion :pray:
Seems that free time limitations is a global problem on this blue planet...
Our community really needs an ffmpeg teacher :wink:

Yes you teached me that good practice, and I am certainly going to setup my new Amcrest cams that way.

But at the same time you teached me that I could extract I-frames, to grab images from the stream with 'minimal' (which is not the same as 'low' of course) CPU usage. Which was a brilliant idea. So instead of using a low resolution and low quality RTSP stream, I also wanted to experiment a bit more with I-frames. Which means setting up only a single (high resolution and high quality) stream on my camera, and use that stream both for having high resolution video to store AND capture I-frames from that stream:

image

Based on the 'theory' I learned from you, I thought that should also be possible. But I failed to do it. And even worse, the more experiments failed the more I realized how bad my FFmpeg knowledge is :frowning_face:

Yes that is something that confuses me. When I write this:

[
    "-i",
    "rtsp://my_cams_url",
    "-vf",
    "select='eq(pict_type,PICT_TYPE_I)'",
    "-r",
    "1",
    "-f",
    "image2pipe",
    "pipe:1"
]

Not sure how to interpret this:

  • Only the I-frames are extracted from the stream, and send to image2pipe?
  • Or image2pipe extracts all the frames from the stream, but only creates a jpeg for the I-frames?

Yes indeed Walter uses it like that, and it seems very optimal to me.
But in my case (like in my diagram scetch) I use the same high resolution stream both for getting high resolution video, and for extracting I-frames. So I can't use the skip_frame nokey I think. Because then I don't have all the frames in my video anymore.

Ah ok, so that is a major difference. So you think FFmpeg drops entire chunks (which don't contain keyframes). Interesting thoughts.
I read on a number of StackOverflow that other users had similar problems: that FFmpeg will decode ALL the frames instead of only the I-frames. Perhaps that is my major problem. So I need to find a way to filter the I-frames, without decoding all the frames. Will need to google again about the I-frames detection in more detail...

That I don't understand either. I thought that this line:

... -c:v bmp -f image2pipe ...

would mean that image2pipe encodes the frames to bitmaps instead of jpeg images. I did this test because I read here that this should improve performance. It was only a simple test to determine whether my cpu usage was due to the jpeg decoding.

Do I understand you correctly: I will gain some performance by avoiding the jpeg encoding, but on the other hand much more cpu is required to pipe the large amount of bitmap data?

I was thinking all the time: how on earth can a single image (per second) consume so much cpu.
But when I read your sentence, I start understanding that my Raspberry might not be very happy with what I was doing.

Do you mean the disk IO hasn't the 65K limitation, so it is much faster?

So you mean to decode ALL frames, instead of grabbing only the I-frames?

Sorry for all the questions. Have been reading through LOTS of online material, but I find it very difficult to understand how FFmpeg works. For example the image2pipe is a black box for me...

1 Like

With the only rtsp source I have, the following seems to work very well in my RPi

[
    "-skip_frame",
    "nokey",
    "-i",
    "rtsp://rtsp.stream/pattern",
    "-vsync",
    "0",
    "-f",
    "image2pipe",
    "pipe:1"
]

However, now I see it seems the stream I'm using is only providing one I-frame per 10 seconds so this fact and the lower resolution may explain the rather low cpu usage on my RPi4 when using that rtsp source (Be aware that if you set your cpu node to check 10 times per second, this adds some 20% to the total cpu load)

I have in addition tried to use fps_mode (-fps_mode:v passthrough) but it seems it is not supported, maybe my version of ffmpeg is just a bit to old: Unrecognized option 'fps_mode:v'.

If we wan't to compare cpu loads I think it is necessary that we use the same type of computer, the same rtsp source or at least a source with equal properties

Best regards, Walter

1 Like

I think the bmp is just a full array of pixels with a small header, so length X width X bytes per pixel( me assuming there is 3 bytes per pixels, rgb, but perhaps it is different). Actually, just looked it up and there might be an extra byte per pixel for an alpha channel. That make the bmp even bigger and more taxing on the node.js process when being piped.

The performance gain or loss will come down to byte size of file being piped and the complexity level of how the image is generated. And another thing to consider is that if you are using a high resolution input and then scaling down the jpeg, that will be yet 1 more thing for ffmpeg to do which will add to its burden.

I can't say for sure, but it seems like that. I have done these tests extensively in the past when I was trying to come up with a cheap way to detect pixel changes between frames. Just the simple task of receiving the piped data caused more burden while the actual parsing of the data was nearly trivial.

It depends. Like you mentioned, if you are just getting jpegs, then you can use the input filter to discard non iframes. If you are copying the h264 content from rtsp to mp4 AND also getting jpegs, then the approach will have to be balanced in some way so that it can work on the pi. I have found that when using hardware accelerated decoding, any error occurring in ffmpeg is not well reported. I had to do trial and error just to figure out there is an undocumented limit to the input resolution.

The hardest part of all of this that I have experienced over the years is to convince another person that they don't need to use a large image resolution for motion/object detection. Start small and work your way up to larger sizes until you get to the point that you are satisfied with the results.

3 Likes

Indeed.. Most ML models are trained on sets of images of a fixed size, eg i think yolo5 was 640px resolution and imagenet was originally 320. So unless you are going to use a sliding window across your higher resolution source you may as well feed it at close to the resolution it was trained at.

2 Likes

Indeed I assume it is added recently. I haven't analyzed the pull requests from which version. My ffmpeg version supports both fps_mode and vsync. But I did my own build on my Raspberry from the latest version...

Yes indeed. Therefore I have also executed your commands on my Raspberry Pi 4. I hadn't noticed that your online stream had a low I-frame rate.

Oh no!!! Seems I need to add some feedback here, to make sure you see you are not talking to the walls ;-). From your feedback (and others like Walter that are doing AI on images a lot) I learned that a high resolution for object detection is not wanted. So I will capture lower resolution images from my new Amcrest camera's, e.g. to do 'car' recognition and license plate recognition.

However in this particular discussion I use high resolution images for a couple of reasons:

  1. When I am experimenting to optimize my FFmpeg commands, I find it easier to work with high resolution images. Because then incorrect or non-optimal arguments lead immediately to a clearly visual performance drama. That way I can step back and say:* "oeps, that particular argument I just added seem to be very bad (or very good)"*.

  2. As mentioned above this discussion is not about getting a low resolution stream from my camera, and skip all the non-keyframes. Instead I wanted to experiment to extra (high-res or low-res) images from a high-res stream, for use cases where a high-res stream is already running (and you want to avoid to setup a single stream only to grab some images). Since I-frames are already complete images, I had really hoped/thought this would be possible.

  3. Adding a filter to scale down the images is on my todo list. Had not tried that yet because - like you also mention - this will result again in extra cpu usage. Which I wanted to avoid.

So you are not talking to the Discourse walls at all :joy:

Apologies if I have overlooked your answer. But have you any idea about this question:

I assume it is the second option. Because in the documentation I see that the select option is used to "Select frames to pass in output". That would be awfull for performance of course. I have been looking whether I could only pass the I frames to the image2pipe, by creating some kind of chain:

Rtsp decode -> intermediate -> image2pipe

So that the intermediate step uses does the I-frame filtering, by applying select='eq(pict_type,PICT_TYPE_I)' to its intermediate output (which would be the input of the image2pipe). But no clue if ffmpeg supports something like this...

Thanks!!

One additional observation when using "-vsync", "0" or "-r", "1" is that the latter one gives many more messages at the output of the ffmpeg-spawn node

So for instance, when I use vsync, I only get one complete msg on the output for each i-frame but when I use r, I get many messages, that seems a bit over-reaching and is maybe not optimal, at least Node-RED will have to handle many more messages for nothing

Maybe fps_mode is as good as vsync?

Best regards, Walter

Using vsync

Using r