Hy Guys...
Sorry noob question here. I would like to know how can i read a txt file and for each line that the file have do a wget. ( Basically the file will contains EAN from different products and check on a website ) . Since i don't know the end of the file how can i do this without an error ?
Please post a example for the first 4 lines of the file.
Example:
http://lineone.com
https//line2.com
http://line3.com
https://ww1.line4.net
better yet just post a link to the file in question if you are allowed to share that info
The File node can be configured to send one message per line of your file. From there you can create a flow that passes each message to an HTTP Request node to make the request you need.
How exactly you get the EAN from the line in the file into the url of the HTTP Request will depend on what format the line of the file is in.
If each line only contains the EAN so you can use it directly in the url, then you can wire the File node straight to the HTTP Request node and configure its url field with the require url, using {{payload}}
as a placeholder for where the EAN should be inserted.
If each line needs some additional parsing first, then you'd need a Function node between them to do that parsing to get the EAN into the payload.
Hy Guys ... I tried your methods but didn't work .
Example of the file :
6901443290734
6901443290710
6901223312921
and the page is www.radiopopular.pt/pesquisa/ where I can have the product name and the price..
Maybe we can go from here ?
[{"id":"3149f240.c0e25e","type":"inject","z":"ac14500e.2c57d","name":"Array of decimals","topic":"","payload":"[1.67,2.98,3.12,4.99,5.50]","payloadType":"json","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":120,"y":960,"wires":[["bd57baa6.00f998"]]},{"id":"bd57baa6.00f998","type":"split","z":"ac14500e.2c57d","name":"Split array","splt":"\\n","spltType":"str","arraySplt":"1","arraySpltType":"len","stream":false,"addname":"","x":200,"y":1020,"wires":[["7ab9e9ed.d514b8"]]},{"id":"7ab9e9ed.d514b8","type":"range","z":"ac14500e.2c57d","minin":"0","maxin":"10","minout":"0","maxout":"10","action":"scale","round":true,"property":"payload","name":"Round value","x":350,"y":1020,"wires":[["f26660ab.007b3"]]},{"id":"f26660ab.007b3","type":"join","z":"ac14500e.2c57d","name":"","mode":"auto","build":"string","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","accumulate":"false","timeout":"","count":"","reduceRight":false,"x":490,"y":1020,"wires":[["f9b5abac.f13828"]]},{"id":"f9b5abac.f13828","type":"debug","z":"ac14500e.2c57d","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":550,"y":1080,"wires":[]}]
Not working for me
Hi Lucas,
One easy way to do this is to simply:
Write a short script using your favorite scripting language (python
, php
, bash
,sh
, etc) and just call your script with an exec
node (for example php myscript.php
or python3 myscript.py
). Then, process the results.
Generally speaking, it's tricky to earnestly advise further because you have not yet identified what you expect to return from your wget
calls (a single text or number? a basic string? a well formatted JSON string? a messy web page source code full of HTML, CSS and Javascript?) and how you would like to process that return data.
However, in general again, if you have a lot of messy data
to process from wget
, for example you might be screen-scraping
a web page, like I did recently related to a coronavirus
stats app, I did a lot of the text processing "dirty work" in my script before returning the data to in this example case, MQTT
and then on to Node-RED
.
This is, obviously, a matter of taste and preference (since they are myriad ways to do this). Some people prefer to do it all in "standard or third-party nodes" while others might prefer to do the text processing via an exec
node; and others might prefer to process the text in a function
node.
My preference, is to process text based on the text. It is the data is very clean (like a well formatted JSON return), then I often just use an http request
node. However, if the data is super messy
, I might do this on the metal
with a python
or php
script, using the exec
node.
Everyone has their own favorites and their own environment (and experiences); and so normally, I tend do fall into a camp guided by a kinda of loose "coding and solutions code of conduct", which goes something like this:
- Explore welcoming and inclusive coding techniques
- Be respectful of differing viewpoints, solutions and favorite technologies (including your own).
So, you can process text in myriad ways, for example, in the shell (using exec
) or in a function node
or your favorite built-in Node-RED or contrib node
.
So Lucas, let me ask you, how do you normally process wget return data when you write code?
this should get you started
[{"id":"96cf7976.efb998","type":"file","z":"f8b2fc74.51bde8","name":"","filename":"/tmp/EAN.txt","appendNewline":true,"createDir":true,"overwriteFile":"false","encoding":"none","x":810,"y":460,"wires":[["fd7c5dbf.74111"]]},{"id":"9b737977.b367d8","type":"inject","z":"f8b2fc74.51bde8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":460,"wires":[["3ff7ff37.ceb29","e4927636.ce16a8","93619c97.c03998"]]},{"id":"3ff7ff37.ceb29","type":"change","z":"f8b2fc74.51bde8","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"6901443290734","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":520,"y":460,"wires":[["96cf7976.efb998"]]},{"id":"26e81659.86f9a2","type":"change","z":"f8b2fc74.51bde8","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"6901443290710","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":520,"y":520,"wires":[["96cf7976.efb998"]]},{"id":"e4927636.ce16a8","type":"delay","z":"f8b2fc74.51bde8","name":"","pauseType":"delay","timeout":"1","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":320,"y":520,"wires":[["26e81659.86f9a2"]]},{"id":"c0041c08.951b5","type":"change","z":"f8b2fc74.51bde8","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"6901223312921","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":520,"y":580,"wires":[["96cf7976.efb998"]]},{"id":"93619c97.c03998","type":"delay","z":"f8b2fc74.51bde8","name":"","pauseType":"delay","timeout":"2","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":320,"y":580,"wires":[["c0041c08.951b5"]]},{"id":"d4deda7c.7a1ac","type":"file in","z":"f8b2fc74.51bde8","name":"","filename":"/tmp/EAN.txt","format":"utf8","chunk":false,"sendError":false,"encoding":"none","x":350,"y":660,"wires":[["d86991d9.03ff7"]]},{"id":"e57093bf.3f69d","type":"inject","z":"f8b2fc74.51bde8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":660,"wires":[["d4deda7c.7a1ac"]]},{"id":"d86991d9.03ff7","type":"debug","z":"f8b2fc74.51bde8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":570,"y":660,"wires":[]},{"id":"fd7c5dbf.74111","type":"debug","z":"f8b2fc74.51bde8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":1050,"y":460,"wires":[]},{"id":"18d2e9c4.496c86","type":"comment","z":"f8b2fc74.51bde8","name":"create file EAN.txt","info":"","x":150,"y":420,"wires":[]},{"id":"51aeb14f.fa6028","type":"comment","z":"f8b2fc74.51bde8","name":"Read file EAN.txt","info":"","x":140,"y":620,"wires":[]},{"id":"b0e06491.256f68","type":"file in","z":"f8b2fc74.51bde8","name":"","filename":"/tmp/EAN.txt","format":"lines","chunk":false,"sendError":false,"encoding":"none","x":350,"y":780,"wires":[["6a33a116.e7076"]]},{"id":"e399c794.a1a56","type":"inject","z":"f8b2fc74.51bde8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":780,"wires":[["b0e06491.256f68"]]},{"id":"ab060979.5469a","type":"debug","z":"f8b2fc74.51bde8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":990,"y":780,"wires":[]},{"id":"3944be9d.2ecbe2","type":"comment","z":"f8b2fc74.51bde8","name":"fetch from website","info":"","x":150,"y":740,"wires":[]},{"id":"6a33a116.e7076","type":"http request","z":"f8b2fc74.51bde8","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://www.radiopopular.pt/pesquisa/{{payload}}#","tls":"","persist":false,"proxy":"","authType":"","x":550,"y":780,"wires":[["ab060979.5469a"]]},{"id":"6ed95bee.cbf9d4","type":"html","z":"f8b2fc74.51bde8","name":"","property":"payload","outproperty":"payload","tag":"","ret":"html","as":"single","x":130,"y":880,"wires":[[]]},{"id":"a17e5de7.61748","type":"comment","z":"f8b2fc74.51bde8","name":"Scraping Proper Data is up to you","info":"","x":190,"y":840,"wires":[]}]
FYI the data your trying to get is messy !!!! You will need to scrape the data. This can be time consuming and if they cange their page (IE they notice people scraping data) then your code will fail.
Good luck ... this is meant to point you in the right direction. Its not a complete solution .
Oh and for the real tough ones I have a server running on proxmox with https://ui.vision/ Its a easy way to navigate to the real tough locations. write a script and automate the clicks.
Best of Luck
I do have a server with proxmox ... never heard about ui.vision. Gonna try @meeki007 . Tell me just one thing about your code above. .. I can't make a file with the msg.payload... example
msg.payload its ' 783236726732' and i want to create a file (txt) with this msg.payload. if i do {{payload}}
or {{msg.payload}}
doesn't work. Do you have a better solution ?
I berleive in @meeki007 example you will want to edit the http request
node and get rid f the pound sign(#) at the end of the url
Why doesn't work with files ?
07/03/2020, 12:57:47[node: ee355129.eac95](http://localhost:1880/#)msg.payload : string[45]
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
07/03/2020, 12:57:47node: ee355129.eac95
msg.payload : string[45]
"C:\Users\Utilizador\.node-red\{{msg.payload}}"
What does yyour flow, that produces that output, look like - export please.
Hy zenofmud
[{"id":"d25240ac.1b669","type":"file in","z":"ef5a8c5d.8ab15","name":"","filename":"C:\Users\Utilizador\.node-red\eans1.txt","format":"lines","chunk":false,"sendError":false,"encoding":"none","x":430,"y":700,"wires":[["a964212b.46502"]]},{"id":"5a83b51f.b8f72c","type":"inject","z":"ef5a8c5d.8ab15","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":180,"y":700,"wires":[["d25240ac.1b669"]]},{"id":"ee355129.eac95","type":"debug","z":"ef5a8c5d.8ab15","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":1210,"y":700,"wires":},{"id":"a964212b.46502","type":"file","z":"ef5a8c5d.8ab15","name":"","filename":"C:\Users\Utilizador\.node-red\{{payload}}","appendNewline":true,"createDir":false,"overwriteFile":"false","encoding":"none","x":920,"y":700,"wires":[["ee355129.eac95"]]}]
Your flow is not importable. Pleas read this thread and edit your flow
Try this, add a change node between the two file nodes with this jsonata expression:
"C:/Users/Utilizador/.node-red/"&payload
and in the `file-out' node remove the entry for file name
Don't think you can use mustache syntax in the http request node.
This flow will get the urls:
[{"id":"c5cf143a.94e328","type":"inject","z":"5dc05a86.d84bcc","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":132,"y":216,"wires":[["a69c89c5.69982"]]},{"id":"a69c89c5.69982","type":"file in","z":"5dc05a86.d84bcc","name":"","filename":"/home/administrator/ean.txt","format":"lines","chunk":false,"sendError":false,"encoding":"none","x":336,"y":216,"wires":[["ff652c9e.2b8f38"]]},{"id":"4c6ad38c.58f374","type":"debug","z":"5dc05a86.d84bcc","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":1010,"y":216,"wires":[]},{"id":"ff652c9e.2b8f38","type":"function","z":"5dc05a86.d84bcc","name":"","func":"m = msg.payload\nif(m!==\"\"){\n u = \"https://www.radiopopular.pt/pesquisa/\"\n return {url:u+m};\n}","outputs":1,"noerr":0,"x":530,"y":216,"wires":[["fecded6a.78f2c8"]]},{"id":"fecded6a.78f2c8","type":"delay","z":"5dc05a86.d84bcc","name":"","pauseType":"rate","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"3","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":680,"y":216,"wires":[["205af.149a8251e"]]},{"id":"205af.149a8251e","type":"http request","z":"5dc05a86.d84bcc","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"","persist":false,"authType":"","x":862,"y":216,"wires":[["4c6ad38c.58f374"]]}]
But the page is loaded via an post request and the output is dynamically generated via an ajax/xhr request.
The http request node/wget/curl will not help with this. You first have to reverse engineer the behaviour of the site, then you can put it into a flow.
This is the post data, which should be posted via the http request node - obviously the "where" should be the dynamic ean number.
@unixneo - it is an XHR object, curl/wget will not help you. When you make the request via wget/curl you only get the main page, no xhr request forwarding. I mean, once you have the data, you are good to go, in either curl or node-red, same work for both.
That is correct.
This is an easy problem to solve if you move the wget/curl
to the shell
and do the processing in the shell
with exec
and return the results to Node-RED
.
However, I do realize that folks want to do this in a Node-RED node
so go for it and enjoy!
But to be clear, this is a trivial problem to solve if you do the work in the shell
and use exec
; but on the other hand, I think @LucasSaraivaAzevedo is running on Windows, so I better bug out of this discussion before I summon the sand worms on Arakis upon me
Cool. Generally, I don't use curl/wget
even in the shell
; and only use python
or php
or Javascript
,jQuery
, etc for these kind of objects.
Honestly, I thought the @LucasSaraivaAzevedo requested a curl/wget
approach, so I was replying to that.
I do not recall @LucasSaraivaAzevedo asking for a different approach, maybe I missed it?
In fact, his original question was:
Sorry noob question here. I would like to know how can i
read a txt file
and for each line that the file have do awget
So, that is why I am responding ... to the OPs original requirement
asking for a different approach, maybe I missed it?
I think he is not aware that he needs a different approach.
This is exactly what he asked for in his first post:
Sorry noob question here. I would like to know how can i `read a txt file` and for each line that the file have do a `wget
I did not read where he defined what data he needed...
Did you?