TCP In/TCP out and char "§"

#1

Good morning together,

I have a question about TCP IN/OUT node. As my Node-Red program receives the "§" character it's output as debug on console is a not printable character. I use the String.charCodeAt(...) function and it prints out 65533.
Why?

#2

Javascript uses 16 bits to represent characters in strings

#3

Ok. But var res = "§".charCodeAt(0); in JavaScripts always returns 167. Node-Red returns 65533...

#4

Well, I'm no JS expert , but

implies that it returns 16bits

Maybe the JS interpreter (is this in a browser?) your using for comparison does some extra stuff behind the scenes?

Hopefully a JS expert will be along to explain the discrepancy :slight_smile:
Simon

#5

as the tcp node returns a buffer - why not just look at the character value directly ? why are you trying to make it into a string then back ?

#6

I can't look direct to the character. My Node-red flow needs to read out a string from a network port 10000 and needs to prepare it for some reason. When a string conatins "a(bc?§d" characters for example, how can I react on a "§" character...

#7

You can use Buffer.indexOf to find the location of that character in a buffer
https://nodejs.org/api/buffer.html#buffer_buf_includes_value_byteoffset_encoding

Or you could convert to a string using a different encoding... maybe binary
https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings
as that would handle non-printable characters better.

#8

Looks like you are getting the Unicode Replacement character passed in through your tcp connection -- why? only you can figure that out, i suspect... but there are some clues on that Wikipedia page:

The replacement character � (often a black diamond with a white question mark or an empty square box) is a symbol found in the Unicode standard at code point U+FFFD in the Specials table. It is used to indicate problems when a system is unable to render a stream of data to a correct symbol. It is usually seen when the data is invalid and does not match any character

So it probably depends upon what the raw data is in the sending code, what language and "encodings" were used to serialize the data onto the socket, AND what language and encodings were used at the receiving side (node-red). However, it seems safe to say that at least 1 of those components is not using the correct encoding.

If you have access to the source code that is sending the data, and/or the raw data itself, that would be a good place to start debugging.

#9

Hi,

thanks to all for your answer. That's helps me. Seems that's really comes as unicode.

#10

well it may well not be unicode on the wire / in the buffer... but as soon as you turn it into a string (or print it) it will be... hence my repeated asking you to go look at the raw buffer.