TCP In/TCP out and char "§"

node-guy · 1 October 2018 07:43

Good morning together,

I have a question about TCP IN/OUT node. As my Node-Red program receives the "§" character it's output as debug on console is a not printable character. I use the String.charCodeAt(...) function and it prints out 65533.
Why?

cymplecy · 1 October 2018 07:56

Javascript uses 16 bits to represent characters in strings

node-guy · 1 October 2018 08:02

Ok. But var res = "§".charCodeAt(0); in JavaScripts always returns 167. Node-Red returns 65533...

cymplecy · 1 October 2018 08:59

Well, I'm no JS expert , but

implies that it returns 16bits

Maybe the JS interpreter (is this in a browser?) your using for comparison does some extra stuff behind the scenes?

Hopefully a JS expert will be along to explain the discrepancy
Simon

dceejay · 1 October 2018 12:12

as the tcp node returns a buffer - why not just look at the character value directly ? why are you trying to make it into a string then back ?

node-guy · 1 October 2018 12:46

I can't look direct to the character. My Node-red flow needs to read out a string from a network port 10000 and needs to prepare it for some reason. When a string conatins "a(bc?§d" characters for example, how can I react on a "§" character...

dceejay · 1 October 2018 18:20

You can use Buffer.indexOf to find the location of that character in a buffer
https://nodejs.org/api/buffer.html#buffer_buf_includes_value_byteoffset_encoding

Or you could convert to a string using a different encoding... maybe binary
https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings
as that would handle non-printable characters better.

shrickus · 1 October 2018 19:47

Looks like you are getting the Unicode Replacement character passed in through your tcp connection -- why? only you can figure that out, i suspect... but there are some clues on that Wikipedia page:

The replacement character � (often a black diamond with a white question mark or an empty square box) is a symbol found in the Unicode standard at code point U+FFFD in the Specials table. It is used to indicate problems when a system is unable to render a stream of data to a correct symbol. It is usually seen when the data is invalid and does not match any character

So it probably depends upon what the raw data is in the sending code, what language and "encodings" were used to serialize the data onto the socket, AND what language and encodings were used at the receiving side (node-red). However, it seems safe to say that at least 1 of those components is not using the correct encoding.

If you have access to the source code that is sending the data, and/or the raw data itself, that would be a good place to start debugging.

node-guy · 2 October 2018 06:42

Hi,

thanks to all for your answer. That's helps me. Seems that's really comes as unicode.

dceejay · 2 October 2018 07:53

well it may well not be unicode on the wire / in the buffer... but as soon as you turn it into a string (or print it) it will be... hence my repeated asking you to go look at the raw buffer.

Topic		Replies	Views
TCP node what is buffer and stream exactly? General	5	717	6 June 2022
TCP buffer to readable decimal numbers? General	8	981	15 April 2021
TCP Output Node: Needed Clarification; Please Help General	16	4199	30 April 2019
Convert a buffer received via TCP General	2	496	16 April 2022
Joining TCP stream of buffer messages; output without '44' General	2	405	3 July 2021

TCP In/TCP out and char "§"

Related topics