How to support Extended UTF8 characters in MQTT Payload

Steph · 15 September 2023 05:56

Hello Everyone,
I am developing a Dashboard on Node-Red. I am receiving MQTT messages
This is an example of the message I can receive :

There is a field _name that I cannot control and that can contain extended UTF8 characters (as é, ê, à, ... )
in that case, Node Red provides me this :

and is not able to filter on payload keys, etc...
I am using the MQTT node-red node in version 3.0.2
I have tried all the Output options of the MQTT node without success
Do you have an idea on how I could receive MQTT messages with extended UTF8 characters and having Node Red able to decode them as UTF8 ones ?
Many thanks in advance

Steve-Mcl · 15 September 2023 06:46

When this happens, could you copy the payload as JSON by using the copy value button that appears over the payload when you hover over it in the debug panel AND tell us what that value should be?

Steph · 15 September 2023 07:07

When I try to expand the msg, I have this :

When there is no special character (UTF8 only) in the field _name

Here is the full buffer :
[123,34,36,116,97,103,34,58,48,44,34,36,116,105,109,101,115,116,97,109,112,34,58,34,49,54,57,52,55,54,48,54,53,50,46,53,56,54,51,49,51,52,53,50,34,44,34,65,114,109,78,105,103,104,116,34,58,48,44,34,65,115,115,111,99,105,97,116,101,100,80,97,114,116,73,100,34,58,91,93,44,34,66,121,112,97,115,115,34,58,48,44,34,67,104,105,109,101,34,58,48,44,34,68,105,115,112,84,111,107,101,110,34,58,91,50,48,48,48,44,48,44,56,53,56,93,44,34,77,101,109,111,34,58,34,34,44,34,78,97,109,101,34,58,34,34,44,34,78,117,109,34,58,49,44,34,80,97,114,116,34,58,49,44,34,82,101,112,111,114,116,34,58,49,44,34,83,117,112,101,114,118,105,115,101,100,34,58,49,44,34,85,115,101,114,34,58,48,44,34,90,111,110,101,84,121,112,101,34,58,49,44,34,95,83,116,97,116,117,115,34,58,123,34,65,108,109,67,111,34,58,48,44,34,65,108,109,70,105,114,101,34,58,48,44,34,65,108,109,77,101,100,105,99,97,108,34,58,48,44,34,65,108,109,84,97,109,112,101,114,34,58,48,44,34,65,108,109,90,111,110,101,34,58,48,44,34,70,97,117,108,116,101,100,34,58,48,44,34,84,98,108,65,67,76,111,115,115,34,58,48,44,34,84,98,108,67,111,34,58,48,44,34,84,98,108,67,114,111,115,115,34,58,48,44,34,84,98,108,69,110,100,79,102,76,105,102,101,34,58,48,44,34,84,98,108,70,105,114,101,34,58,48,44,34,84,98,108,76,111,119,66,97,116,34,58,48,44,34,84,98,108,77,97,105,110,116,101,110,97,110,99,101,34,58,48,44,34,84,98,108,83,117,112,101,114,118,105,115,105,111,110,34,58,48,44,34,84,98,108,84,97,109,112,101,114,34,58,48,44,34,84,98,108,90,111,110,101,34,58,48,125,44,34,95,110,97,109,101,34,58,34,80,111,114,116,101,32,69,110,116,114,233,101,32,92,110,80,111,114,116,101,34,44,34,105,100,34,58,49,44,34,117,114,105,34,58,34,64,92,47,82,70,54,92,47,68,101,118,105]

As you can see in the buffer, I have :
95, _
110, n
97, a
109, m
101, e
34, "
58, :
34, "
80, P
111, o
114, r
116, t
101, e
32,
69, E
110, n
116, t
114, r
233, é
101, e
32,
92,
110, n
80, P
111, o
114, r

the _name field contains : "Porte Entrée". The "é" is not a character in UTF8 standard. So the encoded value is 233 (Caractères ASCII)

TotallyInformation · 15 September 2023 18:40

The encoding for extended characters is most commonly unicode which is a bit of a dogs breakfast since each character can, as I understand it, take between 1 and 4 bytes instead of the fixed 1-byte for UTF8. I think that character can also be represented in UTF16 though as well (2-bytes) as it is one of the extended latin characters.

Hmm, it IS listed in the UTF8 character set however as c3a8. Not sure how that works to be honest.

Yes, it is defined in the ISO/IEC 8859-1 extended ASCII code page.

And here is the extract of the MQTT standard:

Topic Names and Topic Filters are case sensitive
Topic Names and Topic Filters can include the space character
Topic Names and Topic Filters are UTF-8 encoded strings, they MUST NOT encode to more than 65535 bytes

TotallyInformation · 15 September 2023 19:01

And this seems to do the job for you. In a function node:

msg.payload = msg.payload.toString("latin1")
return msg

It converts the input buffer to a latin1 (ISO/IEC 8859-1) encoded string.

Steph · 19 September 2023 15:30

Thanks a lot !!!!

system · 3 October 2023 15:30

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Encoding UTF-8 option is missing in Node Red v1.3.4 General	4	388	17 July 2021
Issue retrieving json from mqtt General	16	3291	7 March 2019
Node-Red mqtt node in String sorting with javascript General	9	511	2 February 2022
What data type does Nodered export to MQTT? General	13	352	5 October 2023
Beginners questions on handling strings and using NodeRED General	19	2406	14 September 2021

How to support Extended UTF8 characters in MQTT Payload

Related topics