Problem: RPi + node-red-node-serialport + FTDI dropping complete TX Message randomly

Hi Forum. Sorry I don't know where to start, so trying here first.

I have a Raspberry Pi 3 Model B using latest Raspberry Pi image and latest Node-red 4.1.0 with node-red-node-serialport connected via USB-serial FTDI FT234XD chip to my device.
The serial port is /dev/ttyUSB0 and speed is 115200 bod.
I'm sending and receiving LF terminated ASCII strings using "serial request" node. My device responses to each request with also LF terminated string.I'm sending few requests every couple of seconds.

The problem I'm facing is that randomly FTDI chip drops one complete request and since my device does not see it, serial request node goes into timeout. I could not understand how this could happen. I don't use any flow control as my serial bus is not utilized even to 1%. FTDI simply ignores one message and the next message is transmitted normally, as I had to build in a retransmission mechanizm as workaround.

I don't know where problem is - in node-red, linux USB driver, or FTDI chip itself, so need help on how to investigate this issue. I can't change the FTDI chip, as this one is built-in in my device, so I have to use it.

The flow is attached. It's a simple TCP to Serial Converter.
flows.json (10.1 KB)

Thanks.

How do you know that it is the transmitter dropping it, rather than the receiver ignoring it?

Also are you sure that the FTDI chip does not get a random message it does not understand?

How about power supply?
Is your pi's power adapter good enough, also to support the extra load from the ftdi.

I connected an oscilloscope on TX and RX pins of FTDI with UART decoder. It clearly says, that FTDI dropped the packet.

How can I check that? FTDI talks to RPi over USB. Shall I try Wireshark on USB?

Thanks for hint. Currently the RPi supplies FTDI over USB and itself supplied by a not very good power adapter. The USB cable is about 2m long. I will add a powered USB hub inbetween and will take a shorter cable. The question in this case, why there are no error messages from USB bus? Or how to check this?

Here is an example of the problem. This is a serial log, written by node-red flow:


Request is what is being send out to device and response is the answer from the device. ReadTimer request reads internal counter of the device to know if it is running and SetValue just instructs to generate a short +- 1.5 V pulse on device output. To each request device responses with request command + 00000, where 00000 is a status of execution.
So now here is the oscilloscope picture:

The packets are visible as single pulses, but this is because the scope is zoomed out. The actual sampling frequency is much hier, so that you can see that scope decodes each pulse correctly.
As one can see it's pretty same as the log file above, with the small difference that the last message from the log simply doesn't appear at TXD line at all.

Looking at your graph (polling ~25ms) I would put money on this being SW/HW limitations.

The OS is non-deterministic (think OS context switching), there will be internal buffers on FTDI and polling of the USB bus, Long wires, protocol overhead, driver overhead.

In my head, you would do well to achieve 50ms polling.

Try slowing it down to see if stability improves. I would personally recommend you start with 100ms and trim it down over time.

Also, dont poll individual addresses - do batch reads slower (see this article I wrote on this subject)

Better still, condition your operations to only operate when required (think using getter and setter over polling operations)

The increased polling rate is actually there in order to get to the problem quicker. In normal situation polling is slower, but problem persists. I just takes hours instead of minutes to get into error.
Also the polling is limited by how fast the serial request node receives the response. Flow just waits for it. It then sends response to the TCP client and TCP clients sends next request. So nothing is really stressing the system here.

That is a clue. So either there is noise in the system or perhaps OS issues or some blocking operation (like hard loop processing starving the node event loop) or other issue exists or ...<insert cosmic-random-event-here>.

You might want to look at using different hardware - perhaps an ethernet based serial server.
You might want to monitor CPU on the PI (or even consider upgrade to something a bit more performant to the PI 3).
If you are running a desktop on the PI, you might want to switch to headless.

Some other things to check/try.

  • Does the protocol have a fixed delimiter (like CR or CRLF or ETX?) then instead of stream, delimit the TCP to ensure full data transmission strings.

  • Use separate serial-in and serial-out nodes to act like a true proxy

  • Do sanity checking on the incoming TCP data (assuming you know the protocol on the wire)

Happy hunting :slight_smile:

The protocol has a LF delimiter. Replacing serial request with serial in out nodes does not change anything. I also don't see any performance issues I could easily simulate much faster traffic, then on the screenhot and the probability of issue appearing just increases slightly.

That suggests the issue is TCP data not arriving or is incomplete?

No. If you look onto the flow I've uploaded, you will see, that I log everything which comes on TCP which is the same data, which goes to serial request node.

I seen that but there is no confirmation (from you) (unless I missed that) that you have verified the data in a hex editor to ensure it is perfect i.e. have you checked it in a hex editor to ensure LFs are present?

I there a reason you are using serial rather than TCP? TCP should be much more reliable.

I'm not sure where you want me to check the data perfectness. In the Node-red data is coming from the TCP Simulator flow, which looks similarly to:
{909E7867-F2D8-4764-B009-DDE39C99E96E}
So there shall be LFs in the requests, unless node-red is really broken here.

I've also checked the TXD and RXD Data using protocol decoder in Oscilloscope and it is perfectly sent/received at 115200 bod and all characters are perfectly recognized including LF at the end of each request/response.

The reason is exactly this - my device is equipped only with USB port with embedded FTDI chip, and I need to establish a TCP communication with it.
An RPi with Nodered flow consisting of just three nodes would be a perfect protocol converter, but unfortunately I have this unreliability Issue which I never saw before.

I am confused, I thought the FTDI chip was plugged into the usb port on the pi, with a serial output which goes to the device.
Are you saying that the connection from the pi to the device is actually usb, and the ftdi chip is in the device? In which case, what have you got the scope connected to?

Yeah, exactly. I thought I correctly mentioned in my first post. Sorry for confusion.

The scope is, offcourse, connected to internal signals on my PCB between FTDI chip and µC.

You said

Which I incorrectly interpreted as meaning that the FTDI chip is attached to the USB port and the device is the other end of the serial line.

Later you said that it is built into the device, but I had already got the wrong impression in my head.

Have a look in /var/log/syslog at the time at which you get the failure and see if there are any messages there. Possibly relating to USB. It is quite common for USB to disconnect and reconnect for no obvious reason, it may be that is what is happening here.

Whatever the cause, USB and serial are prone to such issues, so personally I would not worry about it, and just accept the need to retry occasionally. It should be very easy to implement a queue/retry flow using, for example @colinl/node-red-guaranteed-delivery, which is designed for exactly this sort of situation.
Disclaimer: I am the author of that node.

It looks like there is an error on the last pulse

What is that?