AI hardware for camera stream processing

Hi folks,

There are some folks in this community that are very experienced with AI processing of camera images on external hardware: object detection, face recognition, ... For example our friend @wb666greene has already done quite some effort to share his knowledge and python based setup with the community.

But at the moment is al kind of mindblowing for me. Just don't understand how it works. Due to very limited spare time, I was looking for a solution that:

  1. Is easy to understand for AI noobs like me.
  2. Is easy to setup, without having to go through complex installation procedures. Just want to send images to the device from my Node-RED flow and get the required information in the response.
  3. Should be affordable.

Lots of questions about the subject:

  1. Which device should I buy. There seems to be a number of devices available (e.g. here), so not sure which one is good enough to buy.
  2. Is that easy extendable. Suppose I have more camera's in the future, can I e.g. plug multiple of those devices into a single RPI (e.g. via a usb hub).
  3. Should it be via USB, or can I simply push images to it via the ethernet cable?
  4. Do I really need to start playing with Python. Not that I don't like Python, but I already know that it will then never happen due to free time constraints.
  5. Do I need to decode the rtsp streams on my rpi and send the images to the devices, or can such a device read rtsp streams directly?
  6. ...

Summarized: is it possible to easily configure a device to do e.g. "face recogntion", train it easily for a limited number of know faces, and simply send images to it from my Node-RED flow and get the result back.

Thanks !!!

Hi @BartButenaers , I am quite interested in this subject recently. Recently M5stack launched a couple of cameras with embedded AI functions. They seem to match most of your requirements. What is nice is that the image processing will take place in the camera so you will not need to capture RTSP streams neither overload the Raspberry Pi to process them. As far as I could understand the camera will continuously output JSON string saying what is detecting. Additionally you could access the camera web server. I could not take the time to fully understand their capabilities though.

Link to the documentation

Link to a demo video

1 Like

I will try to reflect my experience with each of your questions.

  1. Right now, price/performance wise its one of the Coral TPU variations, unfortunately they all seem to be very hard to find "in-stock" at the present, time.
    The Modividus devices have the most pre-trained models available as they support Tensorflow and other frameworks while Coral is Tensorflow-lite. The Jetson Nano is an ARM Pi-like system with the lowest end Nvidia GPU that is useful for running AI models. Training models is a full time job, but so-called "transfer learning" can work if you have a good collection of images to train with -- the killer is "annotating" the images so the training "knows" if its going in the right direction or not. This requires a pretty deep dive into XML as that is how many if not most of the systems read the annotations.

  2. The short answer is the AI devices are like a software function done in hardware, you set it up, feed it input data, and retrieve output data. My main contribution is Python code to launch a thread per camera that reads the cameras (rtsp streams are the most common) writes the frames to a queue (one per camera). Another thread (one per AI coprocessor device) is launched to read the camera queues round-robin and write detection results to a common output queue. The main program thread sets everything up launches the worker threads and reads the output queue and take action (in my code pass the frames to node-red to for the final actions). Once I get my head around the contributed "share your project" HSS it may be "easy" to have it feed frames to my python code via MQTT and receive the processed frames via MQTT. I've already written MQTT virtual camera support which worked well in development where one system could do all the rtsp decoding and send the frames to the virtual cameras running the AI on a different system, problem was the Pi4B network layer was not up to snuff, but it worked great if say a recent i3 class machine was running the AI. In the end I don't use my MQTT cams and just run all 14 cameras (seven 4K, and seven 1080p) on an i7 miniPC and get about 3 fps per camera.

  3. USB3 is the starting point, USB3 is faster than Gigabit Ethernet, but the "new" MPCIe and M.2 interface Coral TPU devices are even better if you have an IOT class system that supports them (Jetson NANO can, but its location causes potential thermal issues. At the end of the day you are pushing images from the cameras over ethernet cables via rtsp or http.

  4. Python is currently the development language of most AI as its provides a much easier to learn and use "wrapper" over the underlying C/C++ code that does the real work.

  5. You have to decode rtsp streams somewhere to use an IP camera. If you can find a camera that sends a jpeg frame in response to an http request (so-called Onvif snapshots) you can make a very good system without the overhead of rtsp which lets a Pi4 support more cameras. Problem is I've only been able to find one brand of 720p cna one brand of 1080p cameras with full resolution snapshots and both seem to have been discontinued shortly after I'd found them :frowning: What is really frustrating is a 4K camera that returns either a lame 704x480 pixel snapshot, or a full resolution rtsp "key frame" image which means it can only reply once every ~1-4 seconds.

I can't really comment on face recognition, I've played with a few, but for my purposes the need for full frontal "mug shot" images is pretty much a show stopper for downward looking (above the reach of vandals) security cameras although they could be helpful for "video doorbell" type applications.

It would be nice to encapsulate my python code into a node, but its many in, one out, design is contrary to node-red. My MQTT virtual cameras could be a solution, I'll revisit this once I get my head around the very advanced node-red used in HSS project. Unfortunately time is very tight for me for the next few weeks and I've not got as far along with HSS as I'd hoped.

You might get @dynamicdave to comment on the ESP-cam face recognition as he has lots of ESP32 support in HSS.

As to AI in the camera, its inevitable eventually, but right now I think its pre-mature with the current state of the AI object detection art. For example MobilenetSSDv2-coco dropped my false positive rate by nearly an factor of ten over the initial MobilenetSSD model I started with, meaning the cameras are likely to quickly become "obsolete". When you are climbing ladders and drilling holes to mount cameras you want as long a life as possible for the cameras!

If you are happy with sending all your comings and goings to a cloud service, like with license plate reading, one of Goggle or Amazon's cloud AI services could be a solution.

Hi @Andrei,
I assume that will become common in the near future. But at the moment most camera's don't support it, or very basic supporr. So now I will go for a centralized solution, and send all my images to that hardware.

Very useful info for me. Thanks! Now at least I can focus on that device. Especially because you mention that other brands require xml manipulation: don't have time to do that kind of stuff.

Ok that is also important info. Need to have a look which limitations that has... So this could be a reason to choose Modividus?

Ok now it is becoming a bit clear what you are doing. Do you think it would be possible for me to create nodes for that? I mean in plain javascript, without python? Or did you experience perhaps issues with that? Since you use a separate thread per cam, so perhaps we could NodeJs workers or something else. Would be nice if you could share some details. I started using Kevin´s ffmpeg-spawn node to capture the rtsp stream, and would be nice if I could feed the images via some new node directly to the Coral usb stick...

Now I'm lost again :exploding_head:

Ah is that required? I thought that it was the opposite: that you needed to downscale high resolution images before AI processing...

Can't we have one queue node per cam? Perhaps I have misinterpreted your explanation...

The XML stuff is if you are training or re-training (transfer learning) a model.

I'd suggest finding a publicly available pre-trained model that meets your needs and then get the AI co-processor it uses to run it.

Kevin's ffmpeg-spawn nodes would seem to handle getting the camera images, I hope to learn more of the details about these as I investigate the HSS project.

If you can find javascript wrapper for the AI co-processor driver library then it should be possible to make a node like node-red-contrib-tfjs-coco-ssd that uses the AI co-processor.

This is just a different physical and electrical interface to the AI coprocessor, USB3 is about everywhere, the others usually exist on IOT class hardware (but not the Pi, XU4 etc) for a WiFi module, IMHO WiFi is counter productive here, do you want your security cameras competing with Netflix streaming etc. for your available WiFi bandwidth? so these other modules are less expensive and "cleaner" without dangling dongles to interfere with packaging, if your machine can use them instead.

This was counter-intuitive, the version1 model didn't perform well with higher resolution cameras (I tried a few 5 Mpixels models), but the v2 model came out about the time I upgraded to some 4K cameras and I was blown away by the improvement over 1080p cameras when I tried. I wasn't expecting much -- resizing 3840x2160 to 300x300, but the fact is it seems to actually work better in my experience. The friend or foe decision is much quicker and certain with a 4K image compared to a D1 image when "pinch zooming" on a cell phone. But its not required and you can certainly save money by using 480p cameras if that works for your needs.

Perhaps, but the AI processing node would need a way to read the queue nodes in round-robin sequence. You can have multiple instances of the tfjs-coco AI node in a flow but you only have one hardware instance of the AI co-processor so how do you share it "fairly" among the N cameras? At present this is above my node-red pay grade. But is only a few extra lines of Python to maintain a "nextQueue" thread safe variable and round-robin sample the list of camera input queues.

Thanks for sharing your thoughts! Will need some time to digest it....

Seems that Coral TPU support is not available yet in Tensorflow.js. If we could have such a 100% Javascript solution, we could develop a node for it. Without having to start using Python scripts. But it is not suppoted yet :frowning:

I have sended them a reminder, and hopefully we get an answer. A positive one...

I have also filled in their survey, to let them know we are using Tensorflow in NodeJs. And at the end I have entered that I regret that Tensorflow.js has no Coral TPU support. Could perhaps be useful if others also fill in this survey!

Bart, tell me what would be the benfit in using tensorflow? Isn't that using the SSD model? Compared with YOLO I do not think it is as good in detecting objects. Look at the exaple below. Here I am using the full version of YOLO v4 and the person is detected with 90% probability. Tensorflow fails completely to detect the person in the same image, it just finds...a bench and a boat

Hi @krambriw,
I was hoping you would join this discussion :wink:
That is interesting. And this Yolo, can that easily be integrated in Node-RED? And do you use hardware acceleration? Other noob info perhaps?

Yes the original v1 SSD model is not good compared to yolo4 but it is about the same as the "tiny" yolo variations. MobilenetSSDv2_coco is much better than the SSD v1 and still able to run on the current crop of AI co-processors.

Yolo is only giving about 4-7 fps on my Jetson Nano GPU. A GPU that can match the ~40 fps (14 cameras) that I get with the TPU and an old i7-4500 "industrial PC" will cost more than my i7-4500 and TPU combo and still need a computer to plug it into.

Right now all is pretty much moot as TPU and higher end GPU are pretty much out of stock everywhere :frowning:

Nice, the guys from the Tensorflow.js team have been very helpful already:


I am very much looking forward to that! But it will take some time ...
Hopefully they also work on better pre-trained models also in the near future.

I see here that Tensorflow.js also has a MobileNet model. Is that somehow related to the model you are talking about?

Beside the pre-trained models for Tensorflow.js, it should also be possible to train your own models. For example here, where they say this:

Train a MobileNetV2 using the TensorFlow 2 Object Detection API and Google Colab, convert the model, and run real-time inferences in the browser through TensorFlow.js.

Could something like that be useful, or totally not for some reason?

1 Like

MobilenetSSD is a Convolutional Neural Network that was designed for "weak" systems. It would definitely be good to have MobilenetSSD_V2 available in a version of the tfjs node, but the problem will be performance on "IOT class" systems without AI co-processor support will be poor.

For example on a Pi3B MobilenetSSD_v1 gave me about one frame (inference) every 1.8 seconds or ~).6 fps using the PiCamera module as source. Can be useful in confined spaces, but the problem is, in my tests MobilenetSSD_v2 has pretty consistently been about half the frame rate of V1 on the same hardware. But the permanence in detecting people without false detections is about 10 times better. But its still not perfect, and there is room for improvement, which is why I think AI chips in cameras/DVR/NVR is premature for the time being.

Tensorflow MobilenetSSD_V2 trained on coco is available so training with this image collection is already available:

For the Coral TPU (tensorflow-lite) this model "compiled" for the TPU is also available and is what I am using. I also used the OpenVINO "compiler" to make it for the Movidius NCS2, and the OpenCV DNN functions using the CPU & GPU (Intel HD integrated graphics). My i7-4500 "industrial PC integrated graphics it too old to be supported by the OpenVINO GPU backend.

On a six core i7 (12 threads) with the MobilenetSSD_v2 model compiled by OpenVINO on Ubuntu-Mate 20.04, OpenVINO R2021.1 on i7-8750H I get:

  • Myraid NCS2 USB3 : ~14 fps (DNN_TARGET_MYRIAD)
  • OpenCV 4.5.0-openvino GPU : ~30 fps (DNN_TARGET_OPENCL_FP16)
  • OpenCV 4.5.0-openvino CPU : ~41 fps (DNN_TARGET_CPU)

With the Coral TPU I get ~70+ fps.

For the AI purists, TPU is 8-bit and the OpenVINO version is 32-bit or 16-bit. but my code can run all the AI models in "parallel" with separate TPU, NCS2, CPU, & GPU threads. Despite the TPU processing more of the frames it gave fewer false person detection than did the others in a test I ran for a few days (14 outdoor cameras 7 UHD and 7 1080p all set for 5 fps). Sometimes less is more.

Google supplies object detection models compiled for the TPU here:

@wb666greene, thanks for the extensive overview!!!

Wow ghat seems like a lot of models to me. On that page I see that the example code is python. But do I understand this correctly: as soon as the they have tpu coral hardware acceleration support in Tensorflow.js, we can use also those models?? I mean without having to retrain or recompile or ... ??

Yes the models are sets of coefficients and parameters that are fed into the underlying "engine" (library), tensorflow, tensorflow-lite, OpenCV DNN, pytorch, etc.

Usually the libraries are in C/C++ and the wrappers to use them are Python which is great for creating "wrappers" around libraries to to make it faster and easier to write code to allow the various libraries to interact.

So if a JavaScript wrapper appears for the TPU library (edgetpu-runtime) then all the Google TPU models become available along with any other tensorflow-lite model that can be compiled for the TPU potentially becomes available to use in node-red nodes.

For my purposes node-read launching my python AI code --> MQTT --> node-red works fine. If fact what may be an issue for using HSS with a node-red --> MQTT --> python AI code --> MQTT --> node-red system is the fact that my python code in addition to doing the AI inference, has threads to handle the rtsp stream connections and automatically enter a loop to reconnect to the stream when it inevitably goes away from time to time (power glitches, temporary network issues, etc,). So far my (more limited than'd have liked) playing around with HSS is that when a stream dies it doesn't seem to ever reconnect automatically.

I have currently an experiment running based on Kevin's node-red-contrib-ffmpeg-spawn node: every spawn node starts a separate Linux process to decode an RTSP cam stream, which all run in parallel. So I assume that will run also good as soon as the tensorflow.js team has implemented Coral TPU support...

1 Like

The key issue is, will this spawned ffmpeg thread automatically reconnect to the camera should it temporarily be disconnect? In 24/7 operation rtsp streams fail from time to time for whatever reason and need to be reconnected automatically for robust operation.

I actually think a separate ffmpeg process for each camera (as created by spawn, which I'm pretty sure what openCV is doing internally) could be more efficient than having multiple Python threads reading the streams with OpenCV, if the auto reconnect issue can be solved.

Easy to test, start an rtsp stream, unplug the network cable or power supply from the camera, plug it back in. If the spawned process starts delivering camera images again, problem is solved.

If you have an unreliable camera or network and need to tell ffmpeg to exit if not received video in x amount of time, use the -stimeout in front of the rtsp input parameters (when using tcp). I use this along with a function node to check the exit code and then automatically restart ffmpeg when some of my video feeds become stagnant and have not had any progress due to whatever unknown reason the camera is misbehaving.

kevinGodell ~ $ ffmpeg -hide_banner -h demuxer=rtsp|grep stimeout
  -stimeout          <int>        .D....... set timeout (in microseconds) of socket TCP I/O operations (from INT_MIN to INT_MAX) (default 0)

The restart subflow also keeps track of how many times it had to restart ffmpeg and the last time it occured.


Beside from the performance: what I really like about Kevin's approach is that it is nicely integrated into Node-RED. Each ffmpeg process is mapped 1-to-1 with a Node-RED node, and both input and output of that process are simply Node-RED messages.
And when Tensorflow has Coral support, then we don't need Python at all anymore I assume...

For the HSS I just need to clarify that the handling of rtsp cameras are also within subflows that easily can be modified according to the users need. Including parameters used when starting ffmpeg processes which makes it so very flexible.

I will check when I’m back if I can update the subflows in HSS so we get the same recovery function in there as well


Could you share the flow above for the restart?
Best regards, Walter

This will probably need to be tweaked for your intentions since you may be injecting the args whereas I have mine set in the ffmpeg-spawn node. The important thing to handle is status === 'close' and killed === false values, which indicates that the ffmpeg process closed by itself as opposed to the end user sending a kill code. And then a simple delay node to wait 5 seconds before sending a start command to give the cam a chance to rest before reconnecting to it.

[{"id":"7abea2a.3635c5c","type":"subflow","name":"restart","info":"","category":"","in":[{"x":60,"y":80,"wires":[{"id":"71100764.d230a8"}]}],"out":[{"x":480,"y":60,"wires":[{"id":"f51055b5.29d948","port":0}]}],"env":[],"color":"#DDAA99","status":{"x":400,"y":140,"wires":[{"id":"71100764.d230a8","port":1}]}},{"id":"f51055b5.29d948","type":"delay","z":"7abea2a.3635c5c","name":"delay","pauseType":"delayv","timeout":"1","timeoutUnits":"milliseconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":330,"y":60,"wires":[[]]},{"id":"71100764.d230a8","type":"function","z":"7abea2a.3635c5c","name":"restart","func":"const { status, code, signal, killed, pid } = msg.payload;\n\nif (status === 'close' && killed === false) {\n    \n    const restart_count = context.get('restart_count') + 1;\n    \n    context.set('restart_count', restart_count);\n\n    node.warn(`ffmpeg pid: ${pid} close with code: ${code}, signal: ${signal}`);\n\n    node.send([{ action: { command: 'start' }, delay: 5000 }, { payload: {fill: 'green', text: `${restart_count}: ${new Date().toString()}` } } ]);\n\n}","outputs":2,"noerr":0,"initialize":"context.set('restart_count', 0);","finalize":"context.set('restart_count', undefined);","x":190,"y":70,"wires":[["f51055b5.29d948"],[]]},{"id":"195dfffe.8dca7","type":"subflow:7abea2a.3635c5c","z":"846423c8.7aedd","name":"","env":[],"x":270,"y":100,"wires":[["f0719352.0b9db"]]}]