Hey there!
I would really appreciate it if the experts could give me a sanity check on this one.
I have a small fleet of about 30 devices deployed remotely. They are running a node-red project that is tracking a github repo.
Whenever I have an update to the flow file, I push it to github from the dev/test environment. Then, I remote into the deployed machines and pull the new flow file from the repository. After that, I would normally restart the node-red service right away (node-red-restart), which will allow the new flow file to take effect, in essence updating the application. This is the usual way I have been doing it for some time.
However, since people are using the application on the remote side, they don't always want the service restarted while something is going on, as it leads to service interruption. As you can imagine it's also proven difficult to always coordinate an exact time to do the restart.
My new idea is to remote in and do the "git fetch/git pull" inside the project folder, but not restart the service right away. That would allow the flow file to be technically the new version, but without interruption to the currently running service. Whenever the end-user decides they want to do that update, they just have to reboot the device and it will auto-start with the new flow file after reboot.
In summary, the procedure is:
- A machine is running a node-red project
- I remote in, run a git fetch/git pull to update the flow file while the project is running
- The project continues running for some time, could be days, could even be months
- Next time the device reboots, the project will start with the updated flow file
This has worked for me on the dev/test station, but is this guaranteed to always work? My head is spinning trying to conceive all the ways this could lead to any issues. I can't think of any issue at the conceptual level, but I have not tested this extremely thoroughly and at scale.
Also, with unattended updates I usually like to have a roll-back procedure but I'm not sure what that would look like in this case. If the flow file is in tact on the machine, and node-red starts up as it always does, what additional safeguards could even exist?
Thanks in advance, and open to suggestions!