Http request with windows-1250 encoding problem

I am having troubles loading HTTP request when the page is in windows-1250 encoding. The http request component parses the input into UTF-8 incorrectly and that seems to be irreversible. Any output after that is basically gibberish instead of accented characters. I am not sure how to prevent it from parsing it into UTF-8 or specifying the input encoding so it would be parsed correctly. I could probably load the input into file with some external tool and work with the file but I would prefer to use the http request component.

There must be something I am missing otherwise lots of websites would be unreadable, right?

[{"id":"613da0f5dc72ceef","type":"debug","z":"a2875b641ea93857","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":890,"y":400,"wires":[]},{"id":"dfbc815d2b09e165","type":"inject","z":"a2875b641ea93857","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":340,"y":400,"wires":[["253669a404640f8b"]]},{"id":"253669a404640f8b","type":"http request","z":"a2875b641ea93857","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"https://www.menicka.cz/mobilni/praha-8.html","tls":"","persist":false,"proxy":"","authType":"","senderr":false,"x":510,"y":400,"wires":[["685d722826f55028"]]},{"id":"685d722826f55028","type":"html","z":"a2875b641ea93857","name":"html","property":"payload","outproperty":"payload","tag":"#up > div.vypis_menicek > div.menicka_detail > div.menicka","ret":"text","as":"single","x":690,"y":400,"wires":[["613da0f5dc72ceef"]]}]

Have you tried setting headers to specify the encoding? Have you tried setting the HTTP request to return a buffer then feed it I to an iconv node? Did you search the forum - for example: Read windows-1250 encoded data - #4 by TotallyInformation

1 Like

Hi. Thx for quick reply. FYI its my first day wirh node red.

Setting encoding: yes I have tried that. I setup a function which returns msg variable with msg.headers 'Accept-Encoding' header and fed it into http request node but it didn't help.

Searching the forum: yes I read all posts which talk about encoding. The one you are posting is one of the closest to the solution but it is not helpful for http requeat node.

Buffer: yeah I was thinking about streams, which I guess are called buffers here but as I mentioned it is my first day and I am not sure how buffers work here. The node tool so far looks amazing but for learning it is bit abstract for me so far. I don't exactly know what the nodes are doing internally. Buffers looks like only way to go. Going to investigate. I suppose I just do the http request with output into buffer, parse it through the iconv node and after use some kind of parser node for html or whatever format I need, right?

Buffers are similar to byte array with helper functions - provided by the nodejs framework. See the Docs

Yes, that should get you usable data.

I recommend watching this playlist: Node-RED Essentials. The videos are done by the developers of node-red. They're nice & short and to the point. You will understand a whole lot more in about 1 hour. A small investment for a lot of gain.

It works! Just use the binary output of the http-request node and pass the output to the converter node (node-red-contrib-iconv (node) - Node-RED) with encoding set to windows-1250 (source webpage encoding) and it will get displayed correctly.

[{"id":"36193ce86ca865ca","type":"debug","z":"b65eba7d5d414a94","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":890,"y":860,"wires":[]},{"id":"c35b554dd338f807","type":"inject","z":"b65eba7d5d414a94","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":120,"y":860,"wires":[["f53db24882819fe6"]]},{"id":"f53db24882819fe6","type":"http request","z":"b65eba7d5d414a94","name":"","method":"GET","ret":"bin","paytoqs":"ignore","url":"https://www.menicka.cz","tls":"","persist":false,"proxy":"","authType":"","senderr":false,"x":350,"y":860,"wires":[["560b7612633a3853"]]},{"id":"560b7612633a3853","type":"converter","z":"b65eba7d5d414a94","name":"","from":"windows-1250","x":520,"y":860,"wires":[["36193ce86ca865ca"]]}]
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.