Ok, a couple of minutes ago node-red crashed, I was just finishing up a rather complex new flow... and poof! The flow file under .node-red directory is gone. The cred file is there but the actual flow file is gone.
There is a back up file, .flows.json.backup, do I just copy the back file to the correct flow file name? And restart node red? Running 1.0.6, have not moved to 1.1.0 yet.
Yes, you can make a copy of that file and rename the copy (keep the backup just in case...)
Your latest changes might not be in the file though
Worth trying anyway
To be honest, not sure how or why the crash, I am still locking at the logs. The fact is it happened. Fortunately the back up file is only a few minutes off from the point of the crash, so I may not have lost much.
Not looking good, the editor is not coming up, using a renamed copy of the backup file.
Thats the way to recover I know of. Could it be some more files are damaged?
For the next time, I always make a copy to my NAS of the files in .node-red (except for the modules folder) after major editing. Then I make a full sd-card copy "when I feel it's time". In each RPi I have an extra SD-card attached for this purpose.
But if you just have one instance of NR, I would copy the complete .node-red folder to have just in case. Like now
Oh, I have backup from this morning, a complete copy of the MicroSD card. So worst case, lost everything I did today. Not horrible, but the flow I was working on most of the day is very much likely gone. Really odd, given I am running on a new Pi4 with a new SD card. But it is what it is, I guess.
Oh, excellent, thank you for the suggestion... I basically did the same thing, the long way, I removed the flow file, left and empty one. And from this point, is where the story gets interesting.
Completely copied the SD card to new SD card, never used, no issues, so do not suspect original SD card. All testing below done with new SD card.
Booted Pi, no issues
Validated various functions and applications, no issues
Small MariaDB stable and valid
No failing services or daemons
At this point, suspect nodejs or node-red, core to the issue, i.e. flow file.
Load flow file into a JSON validator, passed, no obvious JSON structure issues
Load each flow back from alternate backup, no issues... until boom.
The last flow I created, loads fine, even runs, but if I edit anything on that flow, and then deploy... boom. Nothing is right, the current flow file never loads again, hangs up the editor. But I know the other flows are running.
So it is some combination of the one flow and node-red and/or nodejs interaction that is the issue. Odd, that editing and deploy that specific flow blows things up..
Given I am running 1.0.6 at the moment, I plan to build a clean image, then test again using 1.1.0. Also, going to try to split the flow file, via explicit export of the suspect flow, and then try to edit in isolation, see if that blows things up.
This is all on Pi 4 Model B with 4 GB RAM, the only customization is that I remove the host name from the default flow file, via the settings.js option.
Often that sort of issue is caused by a loop of some sort. MQTT loops are one example. If you are publishing to a topic, but also subscribe to that topic and that message runs through and gets published again then it goes round and round and clobbers the system. Sometimes that is because there is a ui_switch (or similar) with Pass Through enabled so messages run through it unexpectedly.
There is no need to start again with a clean image, it won't make any difference. You just need to work out what on that flow is causing the hang. If it isn't an MQTT loop or something else obvious then start node red in a terminal and look in the log to see if there is anything helpful there. If not then you can individually disable nodes on the flow (open the node then there is a button down the bottom) till you find it.
Even when it is hung you might find that the command node-red-stop manages to stop it, though it might take a while to get through. Then start it with --safe each time till you find the problem.
Yup it is definitely loop recursive scenario in the flow in question. What I don't get is why the flows.json file disappears when the issue occurs. Does the node-red code not do a 'safe' file update during the deployment? Rename original, say with a time stamp in name, then write to new file, for example? Before any malformed flow causes any issues, the safe save of the json flow file should happen, no?
What?? Do you mean the flows file is deleted? That is very odd, to put it mildly
Are you able to reproduce this? If so then please run a directory listing of the .node-red folder, start node-red in a terminal and leave that open, trigger the problem and then do another listing. Post the directory listings and copy/paste the complete log here please.
on deploy the existing flow file gets "saved" as .flows_name.json.backup (but based on your flow name of course) - i.e. a one level backup, so that should also still exist.
Right, you are in fact doing what is called a 'safe' save. So in my specific case, node-red must have crashed just at the right point where the existing file was renamed as backup, but creation of the revised flows file as interrupted. Given save of flows is only done to 'file' on explicit invoke of deployment.
As a result of the above, to me, the file just went poof, disappeared. I found the backup file, in due course, thinking there has to be a safe 'save' implementation. Loading the backup file initially crashed node-red, because I forgot to use the --safe option.
I can see a future feature, maybe, where something similar to log-rotate, might be implemented, to keep more backups, longer, as a user configurable feature, maybe even remote versus local.
Is the hidden backup file, outlined in the official documentation somewhere? I did not seem to find such via quick google search. Actually the existence of the backup file I did not find in any google search I did... could be just the way I tried to find it.
there is aways the built in projects option which gives you access to git which can be used fully or just for local commits. We don't really want to get into too many options as then they start getting platform/os dependent and require more and more overhead vs using already existing backup systems.
I echo Dave's comments, but I do also think we could do a little bit more around the backups we generate - such as keeping the last 5 (for example) to get past the issue where someone hits deploy whilst trying to salvage something and unknowingly overwrites what may have been the only good backup available.
But of course, its all academic until someone choose to actually do something about it.
In short, to do a bit more, is a great benefit to newer users of node-red, who need the most support. What is added, when added is not a replacement for more robust solutions, but to provide a bit more than a basic safety net.
When I get some more experience with writing custom nodes, I am seriously thinking about writing one that supports a local git server, or such, to make it almost painless to have node-red protect its self.
There are a number of existing flows out in the wild that do archival various ways, maybe we could find one or model an example flow, suggest that said flow is a recommended option by us? Please forgive if I am overstepping here, in reference using we or us terms.
That doesn't hang together. The backup should have been the version that was working, not the new one so shouldn't have crashed node-red.
But something else doesn't seem right either. A couple of posts ago you said
but hanging up the editor is not the same as the file being completely lost. Are you able to replicate the lost file situation or are we actually talking about two separate issues? The hangup caused by some sort of loop and the crash which caused a lost file. Did you possibly do a hard power down after the first hangup which might have caused the lost file?
When I initially loaded the backup, the editor was still choking. At first I thought the backup was corrupted somehow, but it passed JSON validation checks. So, I considered that I had a corrupted SD card, so I replicated the SD card, the copy seemed fine, until the bad flow started, but at the time I did not realize the bad flow was an endless loop. So I then created a clean OS image, a clean install of node-red, and I still had issues with backup flow file. Of course, it was not until I ensured that the specific flow that had the endless loop, did not execute (removed it from the flows file via edit) that my clean OS, clean node-red instance was stable.
The original flow file, did in fact disappear from the file system, when I looked for it right after node-red went unresponsive, editor timed out, dashboard lost connection, etc. I did not power off, but did make sure that node-red process was completely dead, stopped. I was looking for the file, and copied the backup file to be the active flow file, while node-red was completely stopped. It was only when I decided the SD card might be corrupted or failed, that I powered down the device.
I had not thought to try to recreate the exact scenario, since I believe I know what had happened. I did create an endless loop yesterday in error, while reworking the broken flow. But I was careful to explicitly stop node red before it crashed. Stopping node red while it was struggling to complete a deployment, during the endless loop, I did not see the flow file disappear again.
OK, understood. It is difficult to see how the loop and the crash and the missing file were related as node-red will do all the file writing before starting the flow up, but it is an odd coincidence. If you can remember when it was it might not be too late to look back through the syslog files to what is there. It might be too late though, it depends how much history it keeps before deleting old syslog files.