For each Line in a text file do

Hy Guys...
Sorry noob question here. I would like to know how can i read a txt file and for each line that the file have do a wget. ( Basically the file will contains EAN from different products and check on a website ) . Since i don't know the end of the file how can i do this without an error ?

Please post a example for the first 4 lines of the file.
Example:

http://lineone.com
https//line2.com
http://line3.com
https://ww1.line4.net

better yet just post a link to the file in question if you are allowed to share that info

1 Like

Hi @LucasSaraivaAzevedo

The File node can be configured to send one message per line of your file. From there you can create a flow that passes each message to an HTTP Request node to make the request you need.

How exactly you get the EAN from the line in the file into the url of the HTTP Request will depend on what format the line of the file is in.

If each line only contains the EAN so you can use it directly in the url, then you can wire the File node straight to the HTTP Request node and configure its url field with the require url, using {{payload}} as a placeholder for where the EAN should be inserted.

If each line needs some additional parsing first, then you'd need a Function node between them to do that parsing to get the EAN into the payload.

1 Like

Hy Guys ... I tried your methods but didn't work .

Example of the file :

6901443290734
6901443290710
6901223312921

and the page is www.radiopopular.pt/pesquisa/ where I can have the product name and the price..

Maybe we can go from here ?

[{"id":"3149f240.c0e25e","type":"inject","z":"ac14500e.2c57d","name":"Array of decimals","topic":"","payload":"[1.67,2.98,3.12,4.99,5.50]","payloadType":"json","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":120,"y":960,"wires":[["bd57baa6.00f998"]]},{"id":"bd57baa6.00f998","type":"split","z":"ac14500e.2c57d","name":"Split array","splt":"\\n","spltType":"str","arraySplt":"1","arraySpltType":"len","stream":false,"addname":"","x":200,"y":1020,"wires":[["7ab9e9ed.d514b8"]]},{"id":"7ab9e9ed.d514b8","type":"range","z":"ac14500e.2c57d","minin":"0","maxin":"10","minout":"0","maxout":"10","action":"scale","round":true,"property":"payload","name":"Round value","x":350,"y":1020,"wires":[["f26660ab.007b3"]]},{"id":"f26660ab.007b3","type":"join","z":"ac14500e.2c57d","name":"","mode":"auto","build":"string","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","accumulate":"false","timeout":"","count":"","reduceRight":false,"x":490,"y":1020,"wires":[["f9b5abac.f13828"]]},{"id":"f9b5abac.f13828","type":"debug","z":"ac14500e.2c57d","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":550,"y":1080,"wires":[]}]


Not working for me

Hi Lucas,

One easy way to do this is to simply:

Write a short script using your favorite scripting language (python, php, bash,sh, etc) and just call your script with an exec node (for example php myscript.php or python3 myscript.py). Then, process the results.

Generally speaking, it's tricky to earnestly advise further because you have not yet identified what you expect to return from your wget calls (a single text or number? a basic string? a well formatted JSON string? a messy web page source code full of HTML, CSS and Javascript?) and how you would like to process that return data.

However, in general again, if you have a lot of messy data to process from wget, for example you might be screen-scraping a web page, like I did recently related to a coronavirus stats app, I did a lot of the text processing "dirty work" in my script before returning the data to in this example case, MQTT and then on to Node-RED.

This is, obviously, a matter of taste and preference (since they are myriad ways to do this). Some people prefer to do it all in "standard or third-party nodes" while others might prefer to do the text processing via an exec node; and others might prefer to process the text in a function node.

My preference, is to process text based on the text. It is the data is very clean (like a well formatted JSON return), then I often just use an http request node. However, if the data is super messy, I might do this on the metal with a python or php script, using the exec node.

Everyone has their own favorites and their own environment (and experiences); and so normally, I tend do fall into a camp guided by a kinda of loose "coding and solutions code of conduct", which goes something like this:

  • Explore welcoming and inclusive coding techniques
  • Be respectful of differing viewpoints, solutions and favorite technologies (including your own).

So, you can process text in myriad ways, for example, in the shell (using exec) or in a function node or your favorite built-in Node-RED or contrib node.

So Lucas, let me ask you, how do you normally process wget return data when you write code?

2 Likes

this should get you started

[{"id":"96cf7976.efb998","type":"file","z":"f8b2fc74.51bde8","name":"","filename":"/tmp/EAN.txt","appendNewline":true,"createDir":true,"overwriteFile":"false","encoding":"none","x":810,"y":460,"wires":[["fd7c5dbf.74111"]]},{"id":"9b737977.b367d8","type":"inject","z":"f8b2fc74.51bde8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":460,"wires":[["3ff7ff37.ceb29","e4927636.ce16a8","93619c97.c03998"]]},{"id":"3ff7ff37.ceb29","type":"change","z":"f8b2fc74.51bde8","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"6901443290734","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":520,"y":460,"wires":[["96cf7976.efb998"]]},{"id":"26e81659.86f9a2","type":"change","z":"f8b2fc74.51bde8","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"6901443290710","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":520,"y":520,"wires":[["96cf7976.efb998"]]},{"id":"e4927636.ce16a8","type":"delay","z":"f8b2fc74.51bde8","name":"","pauseType":"delay","timeout":"1","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":320,"y":520,"wires":[["26e81659.86f9a2"]]},{"id":"c0041c08.951b5","type":"change","z":"f8b2fc74.51bde8","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"6901223312921","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":520,"y":580,"wires":[["96cf7976.efb998"]]},{"id":"93619c97.c03998","type":"delay","z":"f8b2fc74.51bde8","name":"","pauseType":"delay","timeout":"2","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":320,"y":580,"wires":[["c0041c08.951b5"]]},{"id":"d4deda7c.7a1ac","type":"file in","z":"f8b2fc74.51bde8","name":"","filename":"/tmp/EAN.txt","format":"utf8","chunk":false,"sendError":false,"encoding":"none","x":350,"y":660,"wires":[["d86991d9.03ff7"]]},{"id":"e57093bf.3f69d","type":"inject","z":"f8b2fc74.51bde8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":660,"wires":[["d4deda7c.7a1ac"]]},{"id":"d86991d9.03ff7","type":"debug","z":"f8b2fc74.51bde8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":570,"y":660,"wires":[]},{"id":"fd7c5dbf.74111","type":"debug","z":"f8b2fc74.51bde8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":1050,"y":460,"wires":[]},{"id":"18d2e9c4.496c86","type":"comment","z":"f8b2fc74.51bde8","name":"create file EAN.txt","info":"","x":150,"y":420,"wires":[]},{"id":"51aeb14f.fa6028","type":"comment","z":"f8b2fc74.51bde8","name":"Read file EAN.txt","info":"","x":140,"y":620,"wires":[]},{"id":"b0e06491.256f68","type":"file in","z":"f8b2fc74.51bde8","name":"","filename":"/tmp/EAN.txt","format":"lines","chunk":false,"sendError":false,"encoding":"none","x":350,"y":780,"wires":[["6a33a116.e7076"]]},{"id":"e399c794.a1a56","type":"inject","z":"f8b2fc74.51bde8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":780,"wires":[["b0e06491.256f68"]]},{"id":"ab060979.5469a","type":"debug","z":"f8b2fc74.51bde8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":990,"y":780,"wires":[]},{"id":"3944be9d.2ecbe2","type":"comment","z":"f8b2fc74.51bde8","name":"fetch from website","info":"","x":150,"y":740,"wires":[]},{"id":"6a33a116.e7076","type":"http request","z":"f8b2fc74.51bde8","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://www.radiopopular.pt/pesquisa/{{payload}}#","tls":"","persist":false,"proxy":"","authType":"","x":550,"y":780,"wires":[["ab060979.5469a"]]},{"id":"6ed95bee.cbf9d4","type":"html","z":"f8b2fc74.51bde8","name":"","property":"payload","outproperty":"payload","tag":"","ret":"html","as":"single","x":130,"y":880,"wires":[[]]},{"id":"a17e5de7.61748","type":"comment","z":"f8b2fc74.51bde8","name":"Scraping Proper Data is up to you","info":"","x":190,"y":840,"wires":[]}]

FYI the data your trying to get is messy !!!! You will need to scrape the data. This can be time consuming and if they cange their page (IE they notice people scraping data) then your code will fail.

Good luck ... this is meant to point you in the right direction. Its not a complete solution .

1 Like

Oh and for the real tough ones I have a server running on proxmox with https://ui.vision/ Its a easy way to navigate to the real tough locations. write a script and automate the clicks.

Best of Luck

1 Like

I do have a server with proxmox ... never heard about ui.vision. Gonna try @meeki007 . Tell me just one thing about your code above. .. I can't make a file with the msg.payload... example

msg.payload its ' 783236726732' and i want to create a file (txt) with this msg.payload. if i do {{payload}} or {{msg.payload}} doesn't work. Do you have a better solution ?

I berleive in @meeki007 example you will want to edit the http request node and get rid f the pound sign(#) at the end of the url
Screen Shot 2020-03-07 at 7.41.35 AM

1 Like

Why doesn't work with files ?

07/03/2020, 12:57:47[node: ee355129.eac95](http://localhost:1880/#)msg.payload : string[45]


07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"

What does yyour flow, that produces that output, look like - export please.

Hy zenofmud

[{"id":"d25240ac.1b669","type":"file in","z":"ef5a8c5d.8ab15","name":"","filename":"C:\Users\Utilizador\.node-red\eans1.txt","format":"lines","chunk":false,"sendError":false,"encoding":"none","x":430,"y":700,"wires":[["a964212b.46502"]]},{"id":"5a83b51f.b8f72c","type":"inject","z":"ef5a8c5d.8ab15","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":180,"y":700,"wires":[["d25240ac.1b669"]]},{"id":"ee355129.eac95","type":"debug","z":"ef5a8c5d.8ab15","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":1210,"y":700,"wires":},{"id":"a964212b.46502","type":"file","z":"ef5a8c5d.8ab15","name":"","filename":"C:\Users\Utilizador\.node-red\{{payload}}","appendNewline":true,"createDir":false,"overwriteFile":"false","encoding":"none","x":920,"y":700,"wires":[["ee355129.eac95"]]}]

Your flow is not importable. Pleas read this thread and edit your flow

Try this, add a change node between the two file nodes with this jsonata expression:
"C:/Users/Utilizador/.node-red/"&payload
and in the `file-out' node remove the entry for file name

Don't think you can use mustache syntax in the http request node.

@LucasSaraivaAzevedo

This flow will get the urls:

[{"id":"c5cf143a.94e328","type":"inject","z":"5dc05a86.d84bcc","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":132,"y":216,"wires":[["a69c89c5.69982"]]},{"id":"a69c89c5.69982","type":"file in","z":"5dc05a86.d84bcc","name":"","filename":"/home/administrator/ean.txt","format":"lines","chunk":false,"sendError":false,"encoding":"none","x":336,"y":216,"wires":[["ff652c9e.2b8f38"]]},{"id":"4c6ad38c.58f374","type":"debug","z":"5dc05a86.d84bcc","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":1010,"y":216,"wires":[]},{"id":"ff652c9e.2b8f38","type":"function","z":"5dc05a86.d84bcc","name":"","func":"m = msg.payload\nif(m!==\"\"){\n u = \"https://www.radiopopular.pt/pesquisa/\"\n return {url:u+m};\n}","outputs":1,"noerr":0,"x":530,"y":216,"wires":[["fecded6a.78f2c8"]]},{"id":"fecded6a.78f2c8","type":"delay","z":"5dc05a86.d84bcc","name":"","pauseType":"rate","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"3","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":680,"y":216,"wires":[["205af.149a8251e"]]},{"id":"205af.149a8251e","type":"http request","z":"5dc05a86.d84bcc","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"","persist":false,"authType":"","x":862,"y":216,"wires":[["4c6ad38c.58f374"]]}]

But the page is loaded via an post request and the output is dynamically generated via an ajax/xhr request.

The http request node/wget/curl will not help with this. You first have to reverse engineer the behaviour of the site, then you can put it into a flow.

This is the post data, which should be posted via the http request node - obviously the "where" should be the dynamic ean number.

@unixneo - it is an XHR object, curl/wget will not help you. When you make the request via wget/curl you only get the main page, no xhr request forwarding. I mean, once you have the data, you are good to go, in either curl or node-red, same work for both.

That is correct.

This is an easy problem to solve if you move the wget/curl to the shell and do the processing in the shell with exec and return the results to Node-RED.

However, I do realize that folks want to do this in a Node-RED node so go for it and enjoy!

But to be clear, this is a trivial problem to solve if you do the work in the shell and use exec; but on the other hand, I think @LucasSaraivaAzevedo is running on Windows, so I better bug out of this discussion before I summon the sand worms on Arakis upon me :slight_smile:

Cool. Generally, I don't use curl/wget even in the shell; and only use python or php or Javascript,jQuery, etc for these kind of objects.

Honestly, I thought the @LucasSaraivaAzevedo requested a curl/wget approach, so I was replying to that.

I do not recall @LucasSaraivaAzevedo asking for a different approach, maybe I missed it?

In fact, his original question was:

@LucasSaraivaAzevedo

Sorry noob question here. I would like to know how can i read a txt file and for each line that the file have do a wget

So, that is why I am responding ... to the OPs original requirement :slight_smile:

asking for a different approach, maybe I missed it?

I think he is not aware that he needs a different approach.

This is exactly what he asked for in his first post:

Sorry noob question here. I would like to know how can i `read a txt file` and for each line that the file have do a `wget

I did not read where he defined what data he needed...

Did you?