XML-node XML->JSON Error: Invalid character in name

Hi folks!

Today i encountered a new problem.
I get XML-data by a Webservice-Request.
I tried to convert the incomeing data from XML into JSON.

So far I always got a medium size XML-Response with a length of about 300k of data.
Today I raised the amount of date to about 650k and now I get errors within the XML-node that I never saw before:

I caught the Exception and analyzed the stacktrace I got:

{
    "message": "Error: Invalid character in name",
    "source": {
        "id": "ab2f05cb.afd8a8",
        "type": "xml",
        "count": 1
    },
    "stack": "Error: Invalid character in name
    at XMLStringifier.module.exports.XMLStringifier.assertLegalName (/usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/node_modules/xml2js/node_modules/xmlbuilder/lib/XMLStringifier.js:213:15)
    at XMLStringifier.assertLegalName (/usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/node_modules/xml2js/node_modules/xmlbuilder/lib/XMLStringifier.js:4:59)
    at XMLStringifier.module.exports.XMLStringifier.name (/usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/node_modules/xml2js/node_modules/xmlbuilder/lib/XMLStringifier.js:29:19)
    at new XMLElement (/usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/node_modules/xml2js/node_modules/xmlbuilder/lib/XMLElement.js:26:34)
    at XMLElement.module.exports.XMLNode.node (/usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/node_modules/xml2js/node_modules/xmlbuilder/lib/XMLNode.js:304:15)
    at XMLElement.module.exports.XMLNode.element (/usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/node_modules/xml2js/node_modules/xmlbuilder/lib/XMLNode.js:236:28)
    at XMLElement.module.exports.XMLNode.ele (/usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/node_modules/xml2js/node_modules/xmlbuilder/lib/XMLNode.js:531:19)
    at /usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/node_modules/xml2js/lib/builder.js:108:37
    at Builder.exports.Builder.Builder.buildObject (/usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/node_modules/xml2js/lib/builder.js:120:14)
    at XMLNode._inputCallback (/usr/local/lib/node_modules/node-red/node_modules/@node-red/nodes/core/parsers/70-XML.js:22:37)"
}

Is it a Memory-fault or might this really be a content-problem?

Excuse my lack of sample data but I dunno which part of the 650k I should post?

Do you have any suggestions how I can dig into my problem?

Cheerio
Swen

(note: edited to make the trace readable - zenofmud)

Paste your XML into one of the many online XML validators

If there is still an issue, let us know.

No errors ..... I tried that before....
what next?
I cannot copy the data here because it has personal content..... not shareable by law..

Cheers Swen

If you manually split the file into two (adding the xml stuff to close the first one and open the second) does it fail on one of those? If so you could keep dissecting it till you find the offending content. It is surprising how few iterations it takes when dissecting.

Could you sanitise the data then post it?

Or perhaps send it in a PM?

Sorry.... IF there is a problem with the data.... sanitizing may sanitize the problem and nothing is better.
Actually i Can't manually sanitize 650K of date at all... sorry.... i will go with colins proposal.

Thanks and Cheers
Swen

But it would rule out tags/elements/syntax - vs - content.

Hi Steve!

You are absolutely correct, but the structural difference between 300k and 600k of my data is nothing more than maybe 100 byte....
one is 500 sets of data the other one 1000 sets...

But I really appreciate your efforts to help very much!
thank you!

Hi there!

I made an interesting discovery!

my former XML-data is the output of an https-request node.
That node is configured to return an UTF8-string.

If I try to parse that obviously errorfree XML-string directly with an XML-node to transform the message into JSON it throws the error as described above.

To split the big message into several parts to find the allegedly invalid character, I stored the message into a big file.

I built a small flow shown here to read an UTF8 String from that file to get the same error again.
reducing the amount of data by dividing the content into halfs like binary tree ....

guess what!?!?!? the XML-node did not throw the above error anymore.... even three times and four times of data amount did not show that error again!

so my next question:
what is the difference between an utf8 string from a file-node output against utf8 strin from a https-get node?

[{"id":"361d304c.fa833","type":"inject","z":"e7462388.94e93","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":310,"y":1100,"wires":[["e9fdd5a1.77bdd8"]]},{"id":"e9fdd5a1.77bdd8","type":"file in","z":"e7462388.94e93","name":"","filename":"/home/probierstuebchen/test1.xml","format":"utf8","chunk":false,"sendError":false,"encoding":"none","x":540,"y":1100,"wires":[["ab2f05cb.afd8a8"]]},{"id":"ab2f05cb.afd8a8","type":"xml","z":"e7462388.94e93","name":"","property":"payload","attr":"","chr":"","x":830,"y":1100,"wires":[["fe5d3bad.1ed1c8"]]},{"id":"fe5d3bad.1ed1c8","type":"debug","z":"e7462388.94e93","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1040,"y":1100,"wires":[]},{"id":"a3127cbd.5ee65","type":"catch","z":"e7462388.94e93","name":"","scope":["ab2f05cb.afd8a8"],"uncaught":false,"x":830,"y":1160,"wires":[["2338a58e.f4208a","4a06bf9c.fc349"]]},{"id":"2338a58e.f4208a","type":"debug","z":"e7462388.94e93","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":1010,"y":1160,"wires":[]},{"id":"5c165cd9.8a8704","type":"file","z":"e7462388.94e93","name":"","filename":"/home/probierstuebchen/stack.txt","appendNewline":true,"createDir":false,"overwriteFile":"true","encoding":"none","x":1100,"y":1220,"wires":[[]]},{"id":"4a06bf9c.fc349","type":"function","z":"e7462388.94e93","name":"","func":"return {\n    \"payload\": msg.error\n};","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":840,"y":1220,"wires":[["5c165cd9.8a8704"]]}]

Cheerio
Swen

If you feed it into a file and read exactly that file back in is it ok? If so what settings have you got on the file nodes?

same as on my https-node utf8 string as output and utf8 sting to write and read from file node ....
and yes! it is okay if I do so...

And the rest of the settings in the file nodes?

as in the flow posted above

[{"id":"361d304c.fa833","type":"inject","z":"e7462388.94e93","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":300,"y":1280,"wires":[["e9fdd5a1.77bdd8"]]},{"id":"e9fdd5a1.77bdd8","type":"file in","z":"e7462388.94e93","name":"","filename":"/home/probierstuebchen/test1.xml","format":"utf8","chunk":false,"sendError":false,"encoding":"none","x":530,"y":1280,"wires":[["ab2f05cb.afd8a8"]]},{"id":"ab2f05cb.afd8a8","type":"xml","z":"e7462388.94e93","name":"","property":"payload","attr":"","chr":"","x":820,"y":1280,"wires":[["fe5d3bad.1ed1c8"]]},{"id":"fe5d3bad.1ed1c8","type":"debug","z":"e7462388.94e93","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1030,"y":1280,"wires":[]},{"id":"a3127cbd.5ee65","type":"catch","z":"e7462388.94e93","name":"","scope":["ab2f05cb.afd8a8"],"uncaught":false,"x":820,"y":1340,"wires":[["2338a58e.f4208a","4a06bf9c.fc349"]]},{"id":"2338a58e.f4208a","type":"debug","z":"e7462388.94e93","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":1000,"y":1340,"wires":[]},{"id":"5c165cd9.8a8704","type":"file","z":"e7462388.94e93","name":"","filename":"/home/probierstuebchen/stack.txt","appendNewline":true,"createDir":false,"overwriteFile":"true","encoding":"none","x":1090,"y":1400,"wires":[[]]},{"id":"4a06bf9c.fc349","type":"function","z":"e7462388.94e93","name":"","func":"return {\n    \"payload\": msg.error\n};","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":830,"y":1400,"wires":[["5c165cd9.8a8704"]]},{"id":"62e77862.541b38","type":"https-node","z":"e7462388.94e93","name":"hole Bestellung von Olav-System (https)","method":"GET","ret":"txt","url":"","authorized":true,"agent":true,"x":440,"y":1100,"wires":[["81c61acf.420f48"]]},{"id":"81c61acf.420f48","type":"function","z":"e7462388.94e93","name":"process-data-backup-file","func":"var flowvals = flow.get([\"processid\", \"basedir\"]);\nvar processid = flowvals[0];\nvar basedir = flowvals[1];\nif (!msg.payload) {return null}\n\nvar filename = basedir + \"Persistence/\"+processid+\".xml\";\nreturn {\n    \"filename\": filename,\n    \"payload\": msg.payload\n};","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":730,"y":1100,"wires":[["c471fc4e.dd6ca"]],"icon":"node-red/status.svg"},{"id":"c471fc4e.dd6ca","type":"file","z":"e7462388.94e93","name":"sichere Auftrag als Datei","filename":"","appendNewline":false,"createDir":true,"overwriteFile":"true","encoding":"utf8","x":970,"y":1100,"wires":[[]]}]

now both.... save-part and read part

You have not specified utf8 encoding in the File In node, though it may not make any difference.

of course i did..... the second file node is just to save the enormous stacktrace which i cannot copy from my debug tab within node ted.

all are utf8....

I don't understand. You said that if you save the response to a file and read that file in and pass it to the xml node then it works ok, but if you feed the response directly to xml then it fails. I asked you to show us the config of the two file nodes and you posted a flow. I pointed out that the File In node did not have utf8 specified for the encoding field, but you say that of course it has. However:
image

Encoding is set to default not utf8.

Also show us what the first couple of lines of the xml look like.

my fault.... the file-in node returns a single utf8 string ....
i changed encoding to utf-8 also and there's no difference. it worked from file reader output but not from htps-node output.

the file-out-node which wrote the file from https-get-node uses utf8 encoding also...

Here is the part of my XML-response
Entity HVVBestellung exists thousands of times

<dataroot generated="2021-06-30T16:54:44.558Z">
	<HVVBestellung>
		<Action>NEW</Action>
		<Bestellreferenz>16</Bestellreferenz>
		<Schulnummer>880013</Schulnummer>
		<Klassenbezeichnung>5</Klassenbezeichnung>
		<Anrede>Frau</Anrede>
		<Name>Musterfrau</Name>
		<Vorname>Bärbel</Vorname>
		<Geburtsdatum>02.12.2001</Geburtsdatum>
		<Strasse>Beispielstraße</Strasse>
		<Postleitzahl>12345</Postleitzahl>
		<Wohnort>Hausen</Wohnort>
		<SchuelerId>1</SchuelerId>
		<Startdatum>01.08.2021</Startdatum>
		<Aenderungsdatum/>
		<Kuendigungsdatum/>
		<ProduktkatalogID>11</ProduktkatalogID>
		<Bilddatei>2647.zrQ8i-637563477479585592</Bilddatei>
		<Lichtbildspeicherung>NEIN</Lichtbildspeicherung>
		<Berechtigungsnummer/>
		<Kundennr-FKI/>
	</HVVBestellung>
</dataroot>

Within names there might be critical characters...
nothing complicated i think.

I wonder what happens if you add at the front
<?xml version="1.0" encoding="UTF-8"?>

Thank Colin,

I changed my strategy to save data firstly and then go on with flow steps...
thanks for help, but I don't have the time to get stuck with that problem now...

Thank You
Cheers
Swen