Help tracking this error - oops

This is a repost of what I accidentally deleted.

This is what I had found and I deleted the question a bit too quickly.
I found the problem, and edited the node to stop the error.
Why wasn't the node's name shown?

(This is the original post)
I thought I had things working with error catching but I don't.

When I boot up a machine an error is generated and I get this in my error catch routine:

{"topic":"","payload":"go","_msgid":"1817f8a6.03a9e7","error":"go","time":"1/18/2020, 12:53:27 PM","ttl":0,"_queuetimestamp":1579312407818,"_queueCount":1}

Usually when I get errors it includes a source node field, or something to help me find out from where it came.

Anyone?

This is the part of the error catch flow:

[{"id":"f255b915.f898f8","type":"catch","z":"5b95490e.a46ab8","name":"","scope":null,"x":3545,"y":279,"wires":[["d73967aa.b378b8"]]},{"id":"d73967aa.b378b8","type":"function","z":"5b95490e.a46ab8","name":"Name flow","func":"var device_name =global.get('myDeviceName');\nvar flow_name = flow.get('name');\nmsg.topic =\"ERROR_REPORT/\" + device_name + \"/\" + flow_name + \"/\" + msg.topic;\nreturn msg;\n","outputs":1,"noerr":0,"x":3695,"y":279,"wires":[["9db5c552.31a0d","e342bbda.9915"]]}]

I presume you mean a node red error. As always please post the node red startup log including the error.

Looking where I believe the logs to be this is all I get - and I know: It is too old/wrong:

pi@PIFACE:/var/log $ cat node-red.log

Node-RED service start: Sun Jun 5 13:59:12 AEST 2016


Welcome to Node-RED
===================

5 Jun 13:59:16 - [info] Node-RED version: v0.12.1
5 Jun 13:59:16 - [info] Node.js  version: v0.10.29
5 Jun 13:59:16 - [info] Loading palette nodes
5 Jun 13:59:25 - [info] Settings file  : /usr/lib/node_modules/node-red/settings.js
5 Jun 13:59:25 - [info] User directory : /home/pi/.node-red
5 Jun 13:59:25 - [info] Flows file : /home/pi/.node-red/flows_PIFACE.json
5 Jun 13:59:25 - [info] Server now running at http://127.0.0.1:1880/
5 Jun 13:59:25 - [info] Creating new flow file
5 Jun 13:59:25 - [info] Starting flows
5 Jun 13:59:25 - [info] Started flows

Node-RED service stop: Sun Jun 5 14:29:18 AEST 2016
pi@PIFACE:/var/log $ 

So I can't paste anything useful.

I guess the question now is: Why isn't NR logging?
I haven't turned it off sice 5 June (year?)

And to show what is in the directory:

pi@PIFACE:/var/log $ lf
apt/                   alternatives.log.3.gz  daemon.log.1     dpkg.log.3.gz   messages.2.gz        syslog.6.gz
cups/                  alternatives.log.4.gz  daemon.log.2.gz  dpkg.log.4.gz   messages.3.gz        syslog.7.gz
fsck/                  aptitude               daemon.log.3.gz  dpkg.log.5.gz   messages.4.gz        user.log
lightdm/               auth.log               daemon.log.4.gz  faillog         nodered-install.log  user.log.1
mosquitto/             auth.log.1             debug            fontconfig.log  node-red.log         user.log.2.gz
ntpstats/              auth.log.2.gz          debug.1          kern.log        rbfeeder.log         user.log.3.gz
samba/                 auth.log.3.gz          debug.2.gz       kern.log.1      regen_ssh_keys.log   user.log.4.gz
teamviewer11/          auth.log.4.gz          debug.3.gz       kern.log.2.gz   syslog               wtmp
teamviewer13/          boot.log               debug.4.gz       kern.log.3.gz   syslog.1             wtmp.1
watchdog/              bootstrap.log          dmesg            kern.log.4.gz   syslog.2.gz          Xorg.0.log
alternatives.log       btmp                   dpkg.log         lastlog         syslog.3.gz          Xorg.0.log.old
alternatives.log.1     btmp.1                 dpkg.log.1       messages        syslog.4.gz
alternatives.log.2.gz  daemon.log             dpkg.log.2.gz    messages.1      syslog.5.gz
pi@PIFACE:/var/log $ 

So it isn't that it has been zipped.

Node red generally logs to syslog. To get the startup log run node-red-stop then node-red-start. I think you have done that numerous times before.

I totally agree Colin. But until I become fully understanding of what I am doing, I am going to have to live with ignorance.

I have a text file which tells me that node red logs are in /var/log/
(This is the whole file I have)

#  Most logs are:
NR logs are in /var/log/
#  also NPM logs are in:
~/.npm/_logs/......

# Node-Red install log errors, etc are in:
/var/log/nodered-install.log

# to start logging in the CLI enter this:
sudo journalctl -f -u nodered -o cat

#  Or this one.
sudo journalctl -f -n 25 -u nodered -o cat

If these notes are wrong I should maybe get them updated first.

I just looked at the syslog file in that directory:

pi@PIFACE:/var/log $ ll syslog*
-rw-r----- 1 root adm 177252025 Jan 19 08:49 syslog
-rw-r----- 1 root adm  17928282 Apr  8  2017 syslog.1
-rw-r----- 1 root adm   1143972 Nov 29  2016 syslog.2.gz
-rw-r----- 1 root adm    256323 Oct  3  2016 syslog.3.gz
-rw-r----- 1 root adm    262707 Sep 21  2016 syslog.4.gz
-rw-r----- 1 root adm     72148 Sep  8  2016 syslog.5.gz
-rw-r----- 1 root adm    631442 Sep  4  2016 syslog.6.gz
-rw-r----- 1 root adm   1396698 Aug 16  2016 syslog.7.gz
pi@PIFACE:/var/log $ tail syslog
Jan 19 08:17:01 PiFace CRON[3305]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 19 08:24:46 PiFace Node-RED[416]: 19 Jan 08:24:46 - [info] Stopping modified nodes
Jan 19 08:24:46 PiFace rsyslogd-2007: action 'action 17' suspended, next retry is Sun Jan 19 08:25:16 2020 [try http://www.rsyslog.com/e/2007 ]
Jan 19 08:24:46 PiFace Node-RED[416]: 19 Jan 08:24:46 - [info] Stopped modified nodes
Jan 19 08:24:46 PiFace Node-RED[416]: 19 Jan 08:24:46 - [info] Starting modified nodes
Jan 19 08:24:46 PiFace Node-RED[416]: 19 Jan 08:24:46 - [info] [python-function:Python code in this node  NEW] Python function 'Python code in this node  NEW' running on PID 3563
Jan 19 08:24:46 PiFace Node-RED[416]: 19 Jan 08:24:46 - [info] Started modified nodes
Jan 19 08:49:44 PiFace rsyslogd-2007: action 'action 17' suspended, next retry is Sun Jan 19 08:50:14 2020 [try http://www.rsyslog.com/e/2007 ]
Jan 19 08:49:44 PiFace systemd[1]: Starting Session c7 of user pi.
Jan 19 08:49:44 PiFace systemd[1]: Started Session c7 of user pi.
pi@PIFACE:/var/log $ 

As I have found/fixed the problem I was more asking why it (the error report) wasn't telling me which node caused the error.
On most (all?) other error reports I get it has an origin field - or something - which tells me the node which caused the problem.

Granted there was the recent problem I had which was driving me crazy. The error was in the part of the flow which caught the errors so it just looped so fast it locked up the machine and it took HOURS to find it.

Thanks for the update on the syslog part.
I'll include it in my notes.

But I guess I will just have to chalk this up to unknown.

If it is coming from a node inside a subflow which is inside another subflow, then it won't have the source property.

I only fixed that yesterday.

Thanks Nic.

The problem was stupid me.
I was using a gate node and hadn't set the topic to turn the gate on.
So the node was throwing an invalid command I think.

The payload was simply go and that is shown in the error message.

As soon as I tracked down all the go in that flow and found one with out a topic set which was controlling the gate the problem stopped.

So: The village idiot strikes again.

Now I am trying to get my head around python in a node and what is going on.
(But now I am getting off topic)

But thanks to you and Colin for the help.

Wow! Can that be done? Subflows in subflows!??

(It's subflows all the way down)
A poor attempt at humour

Hi again.

Ok, today - this morning - I rebooted one of my machines.

I am getting the same vague error.

{"topic":"","payload":"go","_msgid":"a27c8451.d305d8","error":"go","time":"2020-1-25 07:33:48","ttl":0,"_queuetimestamp":1579898028962,"_queueCount":0}

This is the part of the flow which captures the error and a bit more which I thought - with the other machine - was the cause of the error.
Alas it isn't.

External required nodes:
node-red-contrib-fan node-red-contrib-simple-gate node-red-contrib-simple-message-queue

Screen shot:
(and explination)

Left to right:
Catch the error. Name the flow. Gate to stop messages if/when needed). Move msg.error to msg.payload. (FAN) Split flow. 1 - to local queue node (back to the left) 2 - get ready to send via MQTT. Another gate allowing errors to be kept local while testing. MQTT send.
Quickly back to #1. Time stamp the message and add it to queue.
The next and wipe buttons control the messages from the queue node to be displayed in the debug node.

Oh, you can see the error I just posted above in the debug column on the right of the screen.

So that message was displayed in the local readings error list debug node.

My ORIGINAL suspicion was that the GO(1) node (top line of nodes in about the middle) was the problem.
The gate node is set for the topic to be CONTROL. The inject node is sending control. (Lower case) I try to keep topics like that uppercase because .... well just to keep a standard.

The node injects a go after 1 second of booting/deployment.

Yeah, ok. I know. The node is set to default open (which means it allows messages to pass.

For the sake of checking, I pressed the STOP node - not shown here - and the gate node changed state. (It is on the screen grab)
The acid test was pressing the GO button and seeing if I get an error.
The gate changed and there is/was no error.

So to me the question remains: who generated this error?

This is the code:

[{"id":"b416af8c.231478","type":"catch","z":"675e227d.d158b4","name":"","scope":null,"x":3550,"y":310,"wires":[["a0c40b2c.3900a"]]},{"id":"a0c40b2c.3900a","type":"function","z":"675e227d.d158b4","name":"Name flow","func":"var device_name =global.get('myDeviceName');\nvar flow_name = flow.get('name');\nmsg.topic =\"ERROR_REPORT/\" + device_name + \"/\" + flow_name + \"/\" + msg.topic;\nreturn msg;","outputs":1,"noerr":0,"x":3700,"y":310,"wires":[["841acbda.0ae54","e782ee14.d63f08"]]},{"id":"e782ee14.d63f08","type":"gate","z":"675e227d.d158b4","name":"","controlTopic":"control","defaultState":"open","openCmd":"go","closeCmd":"stop","toggleCmd":"toggle","defaultCmd":"default","persist":false,"x":3890,"y":310,"wires":[["fbab1d7d.c9ba5"]]},{"id":"fbab1d7d.c9ba5","type":"change","z":"675e227d.d158b4","name":"","rules":[{"t":"set","p":"error","pt":"msg","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":4115,"y":311,"wires":[["71a23f53.7e46a8"]]},{"id":"71a23f53.7e46a8","type":"fan","z":"675e227d.d158b4","x":4335,"y":310,"wires":[["610ba2c8.37e164","b664e555.9b8ef8","1c8fd235.c2c3ce","d87b4467.fe187"]]},{"id":"610ba2c8.37e164","type":"fan","z":"675e227d.d158b4","x":4500,"y":310,"wires":[["6b92a3f6.f002c4"]]},{"id":"6b92a3f6.f002c4","type":"gate","z":"675e227d.d158b4","name":"","controlTopic":"CONTROL","defaultState":"open","openCmd":"GO","closeCmd":"STOP","toggleCmd":"toggle","defaultCmd":"default","persist":false,"x":4640,"y":310,"wires":[["a66da046.dbc09","65e3d118.a54d08"]]},{"id":"454aa1a7.126448","type":"inject","z":"675e227d.d158b4","name":"Go","topic":"control","payload":"GO","payloadType":"str","repeat":"","crontab":"","once":true,"onceDelay":"3","x":4330,"y":270,"wires":[["610ba2c8.37e164","66cff3f0.59f8dc"]]},{"id":"1c8fd235.c2c3ce","type":"function","z":"675e227d.d158b4","name":"Time stamp","func":"msg.time = new Date().toLocaleString();\nreturn msg;","outputs":1,"noerr":0,"x":3700,"y":360,"wires":[["8cc00760.5d3c2"]]},{"id":"8cc00760.5d3c2","type":"simple-queue","z":"675e227d.d158b4","name":"queue1","firstMessageBypass":false,"bypassInterval":"0","x":3895,"y":361,"wires":[["600e4c9e.063f0c"]]},{"id":"600e4c9e.063f0c","type":"debug","z":"675e227d.d158b4","name":"Local Readings error list","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":4125,"y":361,"wires":[]},{"id":"bf4463e5.7fe6b8","type":"inject","z":"675e227d.d158b4","name":"Read","topic":"","payload":" ","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":"","x":3555,"y":400,"wires":[["7cd91921.194678"]]},{"id":"7cd91921.194678","type":"change","z":"675e227d.d158b4","name":"Read","rules":[{"t":"set","p":"trigger","pt":"msg","to":"1","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":3685,"y":400,"wires":[["8cc00760.5d3c2"]]},{"id":"c7530652.e9b6d","type":"change","z":"675e227d.d158b4","name":"Wipe","rules":[{"t":"set","p":"reset","pt":"msg","to":"1","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":3685,"y":440,"wires":[["8cc00760.5d3c2"]]},{"id":"804d7e70.a33be","type":"inject","z":"675e227d.d158b4","name":"Wipe","topic":"","payload":" ","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":"","x":3555,"y":440,"wires":[["c7530652.e9b6d"]]}]

Screen Shot 2020-01-24 at 4.12.19 PM
See the top line of your debug? What does it say?

Yes, that is who is showing the message - isn't it?

I want to know who made the error.

That is my problem. I can't find who is making it and fix it.
Of course now I am also running around with boot times as they are/seem to be in Z time when in the time stamp they show T not Z.

But that's another story.

@Trying_to_learn from the help text of the Catch node:

So have you looked at the value of msg.error.source coming straight out of the Catch node? In your flow you have a change node overwriting msg.error so you are losing the information you are looking.

Ok. Indulge the village idiot:

The catch node catches the error. (error.message etc)
The next node names the flow - that isn't a problem.
(Skipping the gate)
The change node:
That sets the payload to the error. I am not sure if that takes all the other branches with it or not.
(Skipping the FAN and GATE nodes.)
Looping back to the function node named Time stamp.
All that does is set the msg.time. Again: I don't think that would change any of the existing error message parts.

Then it is sent into the queue node and ultimately to the debug node.

If originally there was an error.source.id Will it be destroyed by me moving (or as the node says: set) the error to payload?

A long time ago I asked for an error generating bit of code to help me test my error catching routine, and suffice to say it works. It tells me the name of the node which made the error.

Stupid me, looking at it, it seemed to work because I could identify the node making the error.
But this was a mistake because of reasons I don't want to go into.

So - I'll stop waffing on - the change node destroys the error.x parts of the message when it sets msg.payload to msg.error.

That would explain a lot.

Semi rhetorical but - why?
To me there is no reason you would destroy/wipe the error.x parts.
(I'd better stop now. I detect I am looping with what I am saying)

But that isn't what your node is doing. Your change node is overwriting msg.error with the value of msg.payload.

ARGH!

I read it as set the msg.payload to msg.error.

SET, MOVE ...... Sorry. I know I've fallen for this before. I should have learnt by now.

Obviously not.

Ok. That does explain a lot. Though luckily I don't get that many errors, and the ones I get are usually informative enough for me to find the error.
(That is after it has been through MQTT and on a remote machine.)

So if I want to send the error from the local machine (via mqtt) to another machine.....
The msg.payload is all that is sent.
So I have to move the msg.error (and all branches) to msg.payload.

So..... if I reverse what is in the change node, will all the branches of error be put in the payload?

@Trying_to_learn Andrew, add a debug node to the output of the catch node and set it to display the Complete msg object

Thanks. I did and that was the bingo moment when I kind of saw what was happening.

Though I thought (yes: dangerous for me) that all the error stuff was put in payload.
I had it the wrong way around.

Wow! That was fun.

In one flow I had the two names in the change node the right way around.
Silly me: I hadn't got to the stage of learning where I would copy an entire block of code onto other tabs.
All other tabs I goofed and had them the wrong way around. Thus the problem.

I have just had to go through about 20 tabs on 2 machines (40-ish) in total, and fix that up.
Now I have another 8 RasPies to check.

Oh well. I hope that is the end of that problem.

Thanks Nic and Paul.