XML Data returned as UTF16 from HTTP Request doesnt parse in the XML node

I'm pretty new to node red, and I'm trying to parse an XML file retrieved from HTTP. It's from a lightening detector. I just want to extract a few items and place them in a database, but the XML converter errors out.

Flow and XML are pasted in below.

When I get the XML and pass it directly to the XML converter, I got this error:
Error: Non-whitespace before first tag.Line: 0Column: 1Char: �

I added a change to remove the two characters at the beginning.
Now the XML converter returns Error: Unencoded <Line: 0Column: 2Char:

Oddly if I copy the output from the debug window directly into an inject string, it works perfectly.

Help?

The URL is: http://ccofmobile.thormobile14.net/AL0024.xml

Here's the flow:

[{"id":"4632da77.b89094","type":"tab","label":"Lightening Data","disabled":false,"info":""},{"id":"8a4c73b4.a0395","type":"http request","z":"4632da77.b89094","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"http://ccofmobile.thormobile14.net/AL0024.xml","tls":"","persist":false,"proxy":"","authType":"","x":290,"y":60,"wires":[["482ae5d6.04da0c"]]},{"id":"50bdbdf2.d0c424","type":"debug","z":"4632da77.b89094","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":710,"y":220,"wires":[]},{"id":"db7af6bf.0b2f38","type":"inject","z":"4632da77.b89094","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":120,"y":60,"wires":[["8a4c73b4.a0395"]]},{"id":"31743a29.79ec76","type":"xml","z":"4632da77.b89094","name":"","property":"payload","attr":"","chr":"","x":470,"y":220,"wires":[["50bdbdf2.d0c424"]]},{"id":"482ae5d6.04da0c","type":"change","z":"4632da77.b89094","name":"Remove Bad Chars","rules":[{"t":"change","p":"payload","pt":"msg","from":"��","fromt":"str","to":"","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":150,"y":220,"wires":[["31743a29.79ec76"]]},{"id":"6a17140f.44280c","type":"inject","z":"4632da77.b89094","name":"Inject xml","topic":"","payload":"<?xml version=\"1.0\" encoding=\"utf-16\" standalone=\"yes\"?><loadmovie>  <moviename>ThorNet</moviename>  <thordata>    <displayname>Country Club of Mobile</displayname>    <rawstring>;2011301000710000000B25001OAL002400.</rawstring>    <localtime>02:39:36 AM 06/01/2020</localtime>    <emergencystate>None</emergencystate>    <sirenstate>NotAvailable</sirenstate>    <testresult>Pass</testresult>    <testcode>B</testcode>    <testfailcount>0</testfailcount>    <unitname>L75</unitname>    <unitcode>7</unitcode>    <range>12</range>    <stringrevision>2</stringrevision>    <chiprevision>O</chiprevision>    <revisionfield>1O</revisionfield>    <switch>Front</switch>    <switchvalue>2</switchvalue>    <isatminingsite>False</isatminingsite>    <isoldunit>False</isoldunit>    <uniqueid>AL0024</uniqueid>    <lightningalert>Unknown</lightningalert>    <ad>0</ad>    <di>0.0</di>    <lhl>0</lhl>    <fcc>0</fcc>    <fccrate>0</fccrate>    <energypolarity>+</energypolarity>    <energylevel>275</energylevel>    <latitude>30.6873</latitude>    <longitude>88.15454</longitude>    <latdirection>N</latdirection>    <londirection>W</londirection>    <sensortype>Unknown</sensortype>    <sensorside>Left</sensorside>    <isdataold>False</isdataold>  </thordata></loadmovie>","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":100,"y":400,"wires":[["31743a29.79ec76"]]}]

The XML it returns:

<?xml version="1.0" encoding="utf-16" standalone="yes"?><loadmovie>  <moviename>ThorNet</moviename>  <thordata>    <displayname>Country Club of Mobile</displayname>    <rawstring>;2011301000710000000B25001OAL002400.</rawstring>    <localtime>02:39:36 AM 06/01/2020</localtime>    <emergencystate>None</emergencystate>    <sirenstate>NotAvailable</sirenstate>    <testresult>Pass</testresult>    <testcode>B</testcode>    <testfailcount>0</testfailcount>    <unitname>L75</unitname>    <unitcode>7</unitcode>    <range>12</range>    <stringrevision>2</stringrevision>    <chiprevision>O</chiprevision>    <revisionfield>1O</revisionfield>    <switch>Front</switch>    <switchvalue>2</switchvalue>    <isatminingsite>False</isatminingsite>    <isoldunit>False</isoldunit>    <uniqueid>AL0024</uniqueid>    <lightningalert>Unknown</lightningalert>    <ad>0</ad>    <di>0.0</di>    <lhl>0</lhl>    <fcc>0</fcc>    <fccrate>0</fccrate>    <energypolarity>+</energypolarity>    <energylevel>275</energylevel>    <latitude>30.6873</latitude>    <longitude>88.15454</longitude>    <latdirection>N</latdirection>    <londirection>W</londirection>    <sensortype>Unknown</sensortype>    <sensorside>Left</sensorside>    <isdataold>False</isdataold>  </thordata></loadmovie>

Hi, in order to make code more readable and importable it is important to post it between two sets of three backticks - ``` - see this post for more details - How to share code or flow json

Thanks for the tip. Fixed.

1 Like

The XML is being returned as UTF16

image

the http request node isnt able to (OOTB) able to handle that so set the Request node to return a buffer, then pass that through a function and call .toString("utf16le");

e.g.

msg.payload = msg.payload.toString('utf16le')
return msg;

There seems to also be some garbage characters in the beginning, besides your suggestion didn't work out for me. Instead I used some conversion function found here: https://stackoverflow.com/questions/14592364/utf-16-to-utf-8-conversion-in-javascript

.. and after that cleaned the garbage from the beginning using regex. The below flow seems to work.

[{"id":"f7d1c7f0.c233e8","type":"xml","z":"3aa64a19.dfaf66","name":"","property":"payload","attr":"","chr":"","x":130,"y":300,"wires":[["2c15d138.957e9e"]]},{"id":"112b6e90.080601","type":"inject","z":"3aa64a19.dfaf66","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":100,"y":40,"wires":[["c81ea16a.a0ad8"]]},{"id":"c81ea16a.a0ad8","type":"http request","z":"3aa64a19.dfaf66","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"http://ccofmobile.thormobile14.net/AL0024.xml","tls":"","persist":false,"proxy":"","authType":"","x":150,"y":120,"wires":[["e90bba21.848538"]]},{"id":"e90bba21.848538","type":"function","z":"3aa64a19.dfaf66","name":"decode UTF16LE","func":"msg.payload = decodeUTF16LE(msg.payload);\n\nreturn msg;\n\nfunction decodeUTF16LE( binaryStr ) {\n    var cp = [];\n    for( var i = 0; i < binaryStr.length; i+=2) {\n        cp.push( \n             binaryStr.charCodeAt(i) |\n            ( binaryStr.charCodeAt(i+1) << 8 )\n        );\n    }\n\n    return String.fromCharCode.apply( String, cp );\n}\n","outputs":1,"noerr":0,"x":170,"y":180,"wires":[["653e68c5.fce768"]]},{"id":"653e68c5.fce768","type":"change","z":"3aa64a19.dfaf66","name":"remove garbage characters","rules":[{"t":"set","p":"payload","pt":"msg","to":"$replace(msg.payload, /^.*(<.*)/, '$1')","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":200,"y":240,"wires":[["f7d1c7f0.c233e8"]]},{"id":"2c15d138.957e9e","type":"debug","z":"3aa64a19.dfaf66","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":290,"y":300,"wires":[]}]

worked for me - no regex required - just as i said....


[{"id":"892858e7.6ff808","type":"http request","z":"9b9939a6.d57948","name":"","method":"GET","ret":"bin","paytoqs":false,"url":"http://ccofmobile.thormobile14.net/AL0024.xml","tls":"","persist":false,"proxy":"","authType":"","x":830,"y":120,"wires":[["f8cae4a0.ab98f8","38edee63.455c02"]]},{"id":"757bf4d0.1fbdec","type":"debug","z":"9b9939a6.d57948","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":1030,"y":200,"wires":[]},{"id":"676f2737.127b28","type":"inject","z":"9b9939a6.d57948","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":640,"y":120,"wires":[["892858e7.6ff808"]]},{"id":"b0332d44.9fbc2","type":"xml","z":"9b9939a6.d57948","name":"","property":"payload","attr":"","chr":"","x":810,"y":200,"wires":[["757bf4d0.1fbdec"]]},{"id":"f8cae4a0.ab98f8","type":"debug","z":"9b9939a6.d57948","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":1010,"y":120,"wires":[]},{"id":"38edee63.455c02","type":"function","z":"9b9939a6.d57948","name":"utf16 string","func":"msg.payload = msg.payload.toString('utf16le')\nreturn msg;","outputs":1,"noerr":0,"x":650,"y":200,"wires":[["b0332d44.9fbc2"]]}]
2 Likes

PS, that garbage is the BOM

1 Like

Ah, I had the http request node set to its default utf-8 encoding. Nice catch! So @garryadkins, see Steve's example for the cleanest solution unless you already got it working.

It's surprised me how much JavaScript I've learned from this forum even after using it for both work and personal projects for 5-6 years. In the type of work I've done I've never encountered the need for bit operations or character conversion for example.

2 Likes

TBH, the solution should really check the BOM for endianess

e.g.

  • a BOM of 0xFE 0xFF indicates big endian (so should call .toString("utf16be")
  • a BOM of 0xFF 0xFE indicates little endian (so should call .toString("utf16le")

Now i feel the question is, should the HTTP Request node support a kinda auto string mode for detecting UTF8, UTF16be/le (by detecting the presence+value of a BOM)? Or should the XML node detect the BOM and parse the string accordingly? Or should we just handle it when encountered?

@dceejay any thoughts?

This worked perfect for me. Set the request to binary, change the encoding to utf16 with your line of code and BOOM. Worked perfect.

Thanks!
-Garry

Well (my thoughts given 2 minute read time) - this is the first time (I can recall) that it has come up in 5 years... (for http node - have seen it for file node), so not keen on adding an option to support it manually, but if autodetect can be seamless and not cause further issues then yes I'd vote for that.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.