If I remember correctly, those ESP32 cams serve a motion JPEG stream. So saving the "image" in the browser won't work, as it tries so save the stream indefinitely.
I have been using this approach for my Motion based webcams, serving MJPEG as well.
Basically it uses ffmpeg via an exec node. It captures exactly one frame (a single jpeg) that is sent via stdout back to Node-RED.