I've been building a node-red/Raspberry Pi data collection ecosystem based on the idea of independent JSON configuration files and python scripts that are all stored within a git repo. Node red is also running within a project and uses initialization routine that can figure out which config files corresponds to itself and load the relevant flows. This means I can run N number of different pies any of which fall into half a dozen different data collection/control roles and all of their code can remain in lockstep (updates are only a git pull away!). When I bring a new device online as long as it uses one of the available roles (temperature monitor, etc,) all I need to do is set up a static IP and write a configuration file with the same name as the host name.
Thus far everything is working swimmingly, but I'm moving out of development and into administration and deployment and hence I'll start seeing many many more devices come online soon. Because the architecture is inherently scalable I'm not worried about the actual processes, but I am starting to think hard about how to monitor the health and welfare of the devices. So far I've been using the Catch node to collect errors and then then MQTT node to report them to a central location - however - I can't seem to figure out how to subscribe to N number of devices staying within Node red. I can write a python script to crawl my configuration files to collect all of the IP/Hostnames but I'm not sure how to get node red to automatically add/subtract subscriptions as they are add/removed from the ecosystem.
Is there a way to do this or should I be looking at some other messaging protocol or approach. Ideally I can use a similar messaging approach to ping all the devices on a regular interval to make sure they're still up.
When you say 'devices' do you mean you have multiple MQTT brokers you are trying to connect to, or do you mean a number of topics on one broker?
If the latter then if you organise the topics carefully you can subscribe to them all using the wildcards # and + in the topics. Then you can interrogate the results based on your configuration.
Also if you need to know whether devices are connected or not you can use LWT to do that. Again using wildcard topics you can build a dynamic list of devices that are online and compare that with the set you expect.
require: require, // DANGER Will Robinson!
_env: process.env, // Pass environment vars to Node-RED
NB: Obviously sharing the require function into Node-RED is not a good thing for a production setup! But hopefully gives you the idea.
The former - for example I may have 10 Raspberry Pi's now but I decide I want to add 5 more to the ecosystem. Currently it appears that I need to have an independent "MQTT in" for each of these devices - which means every time I add a new devices I have to add a new "MQTT in" node to my administrative flow.
You could have each devide send a topic with multiple subtopics and in the administrator flow you would subscribe to the main topic
the administrator flow's mqtt-in subscribes to the topic 'mypi/#'
pi1 publishes using a topic of 'mypi/pi1/whatever'
pi2 publishes using a topic of 'mypi/pi2/whatever'
pi3 publishes using a topic of 'mypi/pi3/whatever'
piN publishes using a topic of 'mypi/piN/whatever'
Of course you will have to have code in the administrator flow to differentiate between the incoming messages.
Can I ask for the reasoning behind running brokers on each of those devices if the goal is to talk to all of them through what appears to be a single node red install? Have you thought about running a single broker instead, for example on the node-red device, and connect all other devices to it. Utilising LWT would then solve the last bit of your question:
Furthermore, if they’re all on the same broker you can subscribe through a wildcard topic following the structure referenced by Paul, as long as you use proper prefixing for each device.
Can you explain more about your use case that requires the use of brokers on each pi?
Perhaps I'm missing something in the MQTT architecture approach, but I understood that the MQTT approach is Publisher-Subscriber where the information flow is from the publisher to the subscriber and the general idea is one publisher to many subscribers (one to many).
I'm using this approach on all of my devices to publish process data to sources that need it (databases for storage, other active devices for control). For example, one device may have 10-30 temperature measurement points and I publish the temperatures as individual topics along with identifying information for subscription by other systems that may need it (one to many). Since this functionality is already built into the devices my intention was to leverage this to create a topic that's specifically status/health related on each device. Errors and/or a status heartbeat would be published by each of the devices for consumption by the administrative device (singular).
This "extra" use of MQTT is a bit backwards since I want to send information from N devices to a single administrator (many to one). Steve's internet guide mentions how to do this with python loops but it'd be nice to keep it inside node red.
In simple terms, MQTT uses a publisher/broker/subscriber model. You setup a broker (yes you can have more than one, but lets use one for now) and now you can have many devices publish to that broker.
You could have 5 Pi's and 10 WeMos's all publishing messages to that broker using one or more topics
then you have the subscribers. You can have as many as you want and they will subscribe to topics (and point to that broker). When a message is published, it gets to the broker and the broker looks to see who has subscribed to the msg and sends it to them.
So the topic is what allows a subscriber to receive a message from a publisher.
In your case you seem to want a bunch of pi's to publish messages and a single pi receive them. You could easily install the MQTT broker on your administrator Pi along with NR. You would not need any other brokers. As i said each pi could publish its messages using the topic 'mypi/piN/whatever' (where N is the number of the pi - instead of 'piN' you could use the name of the pi. My pi names are color coded based o the color filimint I use to print the case it is in, so I have yellow and red etc. In my case I could publish using 'mypi/yellowpi/temperature' and 'mypi/redpi/humidity' etc)
the administrator pi would subscribe to the topic 'mypi/#' and get the messages from both pi's.
Hopefully this is cleared, but go read the Hive blogposts. I think it will make things clearer.
Sorry but I only scanned the previous posts. But what I did not found mentioned is the theory of operation of mqtt itself:
mqtt is like twitter for machines: every body can post something and everyone can subscribe to read anything of interest
in most cases you have a Single broker where all devices (collecting data) send there data to and devices who are interested in reading data can subscribe. Multiple brokers should be used only for redundancy and the setup is complex and no advantage in most cases.
all is organized by topics similar to directory folders
the broker forwards data to anybody who subscribe to a certain topic or a group of topics using wildcards
For example device a: sending to allTheData/fromA/currentTemperature and allTheData/fromA/currentHumidity
Device B: does the same only using allTheData/fromB/...
A device can subscribe to exactly this data
but it also can subscribe to allTheData/# to get everything published under allTheData
it can also subscribe to allData/+/currentTemperature to get all temperature readings
no need to subscribe to each and every device or datapoint: only subscribe to # and see what the broker has to offer
if that is too much use wildcards + and #
the best tool to have a inside view is mqtt explorer available to every platform
The best way to start learning the basics but also in depth information you can find here
So to be clear, the preferred MQTT architecture is to have a standalone, centralized, broker server - publish all data to this broker - and then individual nodes subscribe to topics that are managed by this broker.
This means we do not have a true distributed system because all the data transfer relies on the central server to route all the data where it needs to go. I think I now see the rationale more clearly and it makes a bit more sense why the node-red module is set up the way it is, it assumes either you have a central server somewhere that you already knew about (because you must have set it up) or you have so few devices that the default, a localhost broker, works just fine.
Unfortunately this means you need to manage a standalone broker-server somewhere else on the network rather than operating strictly peer to peer. I'll have to think a bit more carefully about how to set this up because I'm not tremendously interested in relying on a central node to pass all the data through as it means a single failure could cause cascading failures...
Perhaps I could set up the administrative node as the "error broker" and it can subscribe to itself, that way each process node could continue to function as it's own broker for "personal business" and then publish through the central error broker for failures.
Perhaps some background would help - my experience with messaging protocols started initially with zeromq and nanomsg - both are brokerless by design. This was compounded by the fact that the node-red approach allows for brokerless-like communication where the fact that there is a broker is somewhat hidden from the user (clearly from me) because you're connecting to a device with a topic, the fact that there is a middle man there is somewhere veiled.
No, mqtt is not peer to peer and makes no pretence to be, it is client to broker. The provider of the data knows nothing about any clients that may subscribe to that topic, there may be none, there may be 1000.
Are you defining topics to indicate where you want the data to be used? If so that is not the normal pattern (which does not mean that it is not a valid technique, just unusual). Normally the topic will indicate the what the data is, not what device it has come from (though that may often be implied) nor where it is intended to be used. Then any client that needs to know, for example, the temperature of machine 7 might subscribe to machine7/temperature.
You might like to look through this guide which is a good description of what MQTT is all about. https://www.hivemq.com/mqtt-essentials/
I define topics by what the data is, but usually this context includes information about which device collected it. For instance if Pi #1 has a temperature collection point in Furnace #2 on the top shelf, the topic would look like Pi1/Furnace2/TopShelf as this fully defines that data point. Error data would be similar as Pi1/Error.
I'll be reviewing MQTT in more detail, it may or may not be the right communication architecture for this application, the only reason I started with MQTT is that another developer in our system is the node-red MQTT library and his approach appears to be more peer to peer (again with a hidden broker.) I'll look into the node-red zeromq library, maybe that's a better fit.
If you are looking into mqtt perhaps take a look into the homie convention as a top layer “protocol” too. It comes with a standardized and (in my opinion well thought out) way for communication for sensor nodes and actuators. Perhaps most interesting is the auto detection feature enable your devices to announce there capabilities to clients/controllers. And the way of handling actuators with acknowledgement and using the broker as a central and controller independent state storage is very interesting when it comes to mixed heterogeneous installations.