Ok, something is going on

I was mucking about with some new stuff.

Suddenly: host not responding.

Waiting waiting waiting.

Nothing.

Then I get this error:

ui/css/app.min.less' wasn't found

What does that mean?

I powered up another RPI and it is doing the same.

Googleing it doesn't show me anything I can understand at this point.

Further investigation reveals this:

Logged into machine.

cd .node-red
node-red-stop
node-red-start

This is what I am seeing:

14 Apr 10:50:52 - [warn] [function:Button Colour] **  Reset button colours **
      pi : TTY=unknown ; PWD=/home/pi ; USER=root ; COMMAND=/sbin/hwclock -r
pam_unix(sudo:session): session opened for user root by (uid=0)
14 Apr 10:50:58 - [info] [udp out:1095c327.4fb3f5] udp ready: 192.168.0.21:6723
pam_unix(sudo:session): session closed for user root
14 Apr 10:51:01 - [info] [mqtt-broker:MQTT host] Connection failed to broker: mqtt://192.168.0.99:1883
      pi : TTY=unknown ; PWD=/home/pi ; USER=root ; COMMAND=/sbin/hwclock -r
pam_unix(sudo:session): session opened for user root by (uid=0)
14 Apr 10:51:06 - [info] [mqtt-broker:MQTT host] Connected to broker: mqtt://192.168.0.99:1883
pam_unix(sudo:session): session closed for user root
14 Apr 10:51:08 - [info] [mqtt-broker:MQTT host] Connection failed to broker: mqtt://192.168.0.99:1883
      pi : TTY=unknown ; PWD=/home/pi ; USER=root ; COMMAND=/sbin/hwclock -r
pam_unix(sudo:session): session opened for user root by (uid=0)
14 Apr 10:51:15 - [info] [mqtt-broker:MQTT host] Connected to broker: mqtt://192.168.0.99:1883
pam_unix(sudo:session): session closed for user root
      pi : TTY=unknown ; PWD=/home/pi ; USER=root ; COMMAND=/sbin/hwclock -r
pam_unix(sudo:session): session opened for user root by (uid=0)
pam_unix(sudo:session): session closed for user root
      pi : TTY=unknown ; PWD=/home/pi ; USER=root ; COMMAND=/sbin/hwclock -r
pam_unix(sudo:session): session opened for user root by (uid=0)
pam_unix(sudo:session): session closed for user root
      pi : TTY=unknown ; PWD=/home/pi ; USER=root ; COMMAND=/sbin/hwclock -r
pam_unix(sudo:session): session opened for user root by (uid=0)
pam_unix(sudo:session): session closed for user root
      pi : TTY=unknown ; PWD=/home/pi ; USER=root ; COMMAND=/sbin/hwclock -r
pam_unix(sudo:session): session opened for user root by (uid=0)
pam_unix(sudo:session): session closed for user root

That looks worrying.

Though I think I see a problem and the last part with the COMMAND=/sbin/hwclock -r is what I was something I was doing.

May have to roll back to an earlier flow.

(Luckily I had made a backup 2 days ago.)

Rolled back and NR seems to start ok.
The errors shown are not there and though MQTT doesn't come up straight away, it does get itself working.

But now I can't get to the NR page for that machine.

Best of my knowledge it is ok otherwise.

Powered down this and got that machine working with display/keyboard.

Booting is painful.

NR loads, but it is slow.

As you can see it has been restarting for reasons unknown.

The machine is a RPI B Single core so it is really running flat out.

I tried a recent back up.

Still fails.

An earlier back up (say from March)

Fails.

Here is some stuff I got from it while I was looking:
Screen shot nothing new really only that it seems to be booting, but then failing and restarting.

The other is just more from the CLI when I start Node-Red


Log1.txt (3.0 KB)

I have finally rolled it back a fair way with dates.
I don't think that matters, but what I can see is a lot of traffic.

I went to one of the main timer parts and I think it has gone mad.

Very quickly I see the default 10 seconds flash up.

But before I can do much else - like press one of those STOP buttons to stop the loop driving the rest of the flow nuts, it suddenly goes to 0.02ms
(See latest picture)

How can I get to the code and maybe stop one of the two blocking nodes I have just near it?

Assuming you are node-red v20.x then you can start node red with the --safe flag and it will start the editor but not the flows.

cd ~/.node-red
node-red-stop
node-red  --safe

Thanks.

In the mean time, I found that (somehow) the loop time was being set to 0.02 seconds.

It kind of killed the machine's processor.

Luckily after about 20 attempts, I got in quick enough and deleted an offending node and it didn't get away as it did.

I am now going through re-updating the flows with all the stuff done since then, as the flow on which I caught it was a bit behind with updates.

Wouldn't the command be:

Just checking.

No.
Once you have edited the flow, it will start running when you deploy it, so don't deploy till you have done everything you need to do.
By the way, one should try and use a meaninful title for threads. "Something is going on" is not a useful title to see on thread and get an idea of what the thread is about. You can edit it if you can think of a better title.

Yeah, but you are missing an important bit.

For what ever reason the loop/delay went from 10 seconds to 0.02 seconds.

When I boot and NR loads, it has precious little time before it just screams to death with the delay being so small and what needs to be done heavy.

I had to stop NR, start it from the CLI to see what was going on.
That took about 20 minutes.

Then I saw the loop and posted.

Subsequently when I tried to edit the value it just didn't work.
So I tried to stop it getting too deep into the flow - you see the traffic lights?

That wasn't quick enough either.

I don't know how but after a lot of effort, I got in as the page loaded and deleted one of the nodes which was causing the runaway and got it DEPLOYED before the machine died.

I think I have it working now, but need to look at that part of the code again and see if I can see what is causing it.

Weird, as it used to work and I did nothing to change that part of the flow.

I was answering your question "How can I get to the code and maybe stop one of the two blocking nodes I have just near it?". The fact that in the meantime you managed to achieve it by other means does not mean I was missing an important bit.

I needed to edit it outside Node-red.

As soon as it loaded - say from boot - it would hang within about 10 seconds.

Perhaps it’s time you reread the blog post about v0.20

It contains details about what the safe flag does (in the runtime section). https://nodered.org/blog/2019/03/12/version-0-20-released

1 Like

Not if you started it with --safe. That is the whole point.

Yeah, hind-sight is wonderful.

If you read other peoples posts more carefully you could use fore-sight instead.

1 Like

Yeah, well....... Dunno about you, but my life is far from perfect.

Hi, I'm not sure how long it took you to write your original post, but having spent the time to write it, it would do you well to either read the responses you get, as it just causes frustration to those who have to read it and are trying to help, when you carry on, or divert onto other topics halfway through (though not in this instance). If you don't actually want help then don't post it. I know there will be a delay in responding as a) timezones, b) people have other things to do, but that is the nature of online forums.

I guess in this instance your device may not have been running 0.20 as you said you had reverted to a previous version, but that is easy to check as it is logged in the console at start. If that was the case then yes, either you have to "get in quick" like you managed - or to hand edit the flow file and edit out the offending node or wire (fraught with danger)... so you were lucky. If you were running 0.20 then there is no excuse for not reading the responses to your own question.

DCeeJay,

I agree, but am torn between:
1: Getting a problem, posting a request for help and simply sitting there with my hands out expecting help.
This was an epic failure and it hit me broadside from nowhere.
It took me.... (Started at 10:38 board time). I was going about 40 minutes before that.

2: Though I posted, I pull my finger out and try to fix it.
So is it good that I found the problem myself?

It has been about 9 hours. I have hardly moved from said seat but for the times I had to power up other machines and do a quick other thing.

I now know about the --safe option, but I am not sure if it is invoked by:

cd .node-red
node-red-stop
node-red-start --safe

Or the original post.
Granted I can look it up. But I am still in tidy up the mess which was made.
The problem took down 3 other RPIs and I need to edit them and get all them safe from such a problem in the future.

Should have added this in the first reply:
You saw the screen grabs with the time on the delay node.
My best guess is 0.02 seconds.

Rolling back to an older version didn't remove the problem.
It back infected the older version/s.
I found out that the MQTT messages were (possibly) causing the problem: Message persistence.

The fact I could delete the node and DEPLOY it was more luck than anything.
But: that I had stopped the problem, I was not going to go back to a later version which would have same problem and that I got the node deleted and deployed before the machine "spat the dummy" with an overload - shown in posts - I just just lucky.

The command is as Colin typed it - node-red --safe
NOT node-red-start --safe that is something you have made up.

Yes - well done for fixing it yourself - but if you manage that before anyone replies then either delete the post or mark it solved so no-one has to reply.

But yes -- you may have to wait for a reply - especially when a) you write such a long topic - and b) don't title in any useful way to indicate what it is about. It takes time for people to try to absorb what the actual point of the post is and as I said a lot of users are not on your timezone so it could take overnight to reply. Also this is a public forum and no-one is paid to respond in any timeframe, your urgent problem is not someone else's priority, so yes in the meantime you may have fixed it (good), but acknowledging others efforts to help goes a long way to keep them on board for the next problem you may have.

Glad it's now recovered. All the best

I just wanted to clarify that.

So: node-red --safe is it if anything spits the dummy and I need to start NR safely.

About the reply timing.

I had maybe fixed it about 10 minutes after the reply.

I was not going to stop chasing the problem with a reply.

I muddled on and got it worked out about that time.

You haven't been reading the replies again, I already confirmed what it should be when you queried it the first time.

Is that a question? I already told you what it does, and it is also explained in the notes liked to earlier. It starts the node-red editor but does not start the flows. So if your flow is crashing or hanging or otherwise misbehaving you can get in and edit it without it crashing.