Robust remote update procedure and best practices?

Hey there!

I would really appreciate it if the experts could give me a sanity check on this one.

I have a small fleet of about 30 devices deployed remotely. They are running a node-red project that is tracking a github repo.

Whenever I have an update to the flow file, I push it to github from the dev/test environment. Then, I remote into the deployed machines and pull the new flow file from the repository. After that, I would normally restart the node-red service right away (node-red-restart), which will allow the new flow file to take effect, in essence updating the application. This is the usual way I have been doing it for some time.

However, since people are using the application on the remote side, they don't always want the service restarted while something is going on, as it leads to service interruption. As you can imagine it's also proven difficult to always coordinate an exact time to do the restart.

My new idea is to remote in and do the "git fetch/git pull" inside the project folder, but not restart the service right away. That would allow the flow file to be technically the new version, but without interruption to the currently running service. Whenever the end-user decides they want to do that update, they just have to reboot the device and it will auto-start with the new flow file after reboot.

In summary, the procedure is:

  • A machine is running a node-red project
  • I remote in, run a git fetch/git pull to update the flow file while the project is running
  • The project continues running for some time, could be days, could even be months
  • Next time the device reboots, the project will start with the updated flow file

This has worked for me on the dev/test station, but is this guaranteed to always work? My head is spinning trying to conceive all the ways this could lead to any issues. I can't think of any issue at the conceptual level, but I have not tested this extremely thoroughly and at scale.

Also, with unattended updates I usually like to have a roll-back procedure but I'm not sure what that would look like in this case. If the flow file is in tact on the machine, and node-red starts up as it always does, what additional safeguards could even exist?

Thanks in advance, and open to suggestions!

Since Node-RED only reads the flow file on startup unless you use one of the admin API's and only writes if you do a deploy or, again, use one of the admin API's, you should be pretty safe as long as you don't allow any changes to be made locally - which I think you are hopefully already blocking?

You might also consider doing an npm update in the userDIr folder so that each time your run, you are updating to the current releases of the in-use nodes (actually it might not be the current release of course since npm will only update to the latest minor release in the current major).

You should also consider trying to keep the remote devices OS's updated as well. Though this is slightly more risky and I'd recommend having a "local" remote device and running a test update on it before any of the live devices. But generally, keeping the devices updated will be less risk than allowing them to fall behind and then having a major issue when trying to update.

Hi Julian, really appreciate the input and suggestions. There will be no writes/deploys to the project during normal operation, so I think that will be fine.

I think for long term support it would make more sense to Dockerize the project and its dependencies. For the near term, I'm left to the functionality of git and node-red projects, which are powerful in there own right so not a bad starting point. I just wanted to make sure what I was doing made sense to those more experienced than I.