ffmpeg. RTSP can carry audio with the video. The only problem that can occur is the type of audio codec in the ip cam may not be the right type of audio that can be played in the browser, which would require audio encoding and uses a bit of cpu, but not too much. Some of my newer cams have audio that is already encoded with the aac codec, which can easily be played in the browser. Maybe yours does?
The best way would be to use the 2nd video stream that should be available on your rtsp ip cam. That will be the least resource hog since you can just stream copy it without re-encoding. I always use both the main stream and sub streams from my cams so that I can stream the lower resolution version when in a slow wireless network.
You really don't need the socket. io player. You can just use hls.js to play the hls.m3u8 file. It is much more stable compared to my custom player.
As for the other questions , I don't have any good answers since I only do 24/7 recording without any motion/object detection. ffmpeg takes care of writing mp4 video all day long, broken into individual videos of 15 minutes duration. I have it written to a 6TB WD purple in an external enclosure. Once a day a script runs to unlink old videos more than 2 weeks old.