Trying to retrieve info from a website problems, I'm just a rookie


#1

Hello everyone!
Completely new, but I am very much in need of help.
I'm currently working on a project where I retrieve a time of sun rise and sun set from a website (https://soltider.dk/loc/6100-haderslev-danmark#) so that I may transfer it to an Arduino and use it to turn lights on and off.
I am currently using 4 nodes,

as I saw on a guide. (https://cookbook.nodered.org/http/simple-get-request), however being very green on the topic I'm utterly unsure what to do.

I've managed to get the third node to post a time by copying the Outer HTML (< span id="sunset">16:07</span (missing a >, but else it will just say 16:07)) and print it. Unfortunately I'm looking to have this time update every day to befit that the sun rises at different times, which I don't know how to.

Please help!


#2

There are several different nodes, that will trigger a message sunrise and sunset ( based on lat long that you supply).
Try searching on http://flows.nodered.org


#3

You can make your flow run each day by configuring the Inject node to repeat once a day at the time you want. You probably also want to configure it to run at startup.

However I suggest you search the flows site flows.nodered.org for sunrise in case you find something that would save you effort. There are several that look potentially useful, this one might for example https://flows.nodered.org/node/node-red-node-suncalc


#4

.. That is honestly kinda handy to simply get a node to do it, thank you!, but what I'm hoping for is instead learning how to get the HTML node to find a specific part of a HTML website and printing it again and again every day.

With the current setup I could have the injector simply send its signal every 24 hours, but the problem I face at the moment is that it'll print the 16:10 that I've put into HTML node every 24 hours. Is there a way I can get the HTML request, or another function, to print the SPECIFIC part of the website that states sunrise, into the debug or the HTML node?

Maybe get a Function node to search for the < span id="sunset">16:07</span bit and have it print the number within?


#5

I am obviously missing something, I thought that is what you said you had already achieved. Can you explain again please?


#6

There is a core node called html that can pull strings out of a (static) html page -- according to its info:

Extracts elements from an html document held in msg.payload using a CSS selector.

Inputs

payloadstring

the html string from which to extract elements.

selectstring

if not configured in the edit panel the selector can be set as a property of msg.

Output

payloadarray | string

the result can be either a single message with a payload containing an array of the matched elements, or multiple messages that each contain a matched element. If multiple messages are sent they will also have parts set.

Details

This node supports a combination of CSS and jQuery selectors. See the css-select documentation for more information on the supported syntax.

However, looking at the page, it seems to use a jQuery ajax call to get the actual data to fill into the empty elements on the page -- so this may not work for you.


#7

Unfortunately what I did is go to the website, inspect its elements, find the time and copy the Outer HTML. That is how I get it once, but then it becomes static in the HTML node. I'd have to change it every day manually, where I hope to have it done automaticly. I am having very little success, unfortunately.


#8

Ouf, that's not good. I'm unsure what else could be used. I'm not great with functions, but perhaps something will have to be done there?
I know I can use a .find function, but that would require that I have my computer download the website every day, for the function to then find it in the website and then send it on to the Arduino. A weekly or monthly purge would solve that problem, however.


#9

Another possible way could be getting Node Red to somehow follow the HTML URL (unsure what those are called.)
"html>body>div#container<div#mapcontainer>div#info>div#infotable>div.infotd>span#sunset" and get it to read the number in the sunset part? I'm completely blank in how to do it, however.

Thanks for all the responses thus far by the way, it's greatly appreciated.


#10

can you post your flow? From the image, I’m confused as to why it is giving the same value every day.


#11

Sure, but unfortunately I can only send one picture per post, so it'll crop it up a bit.


#12

And here.


I've tried a few different things here. Leaving it blank doesn't help (obviously), not having the time in returns an "empty" result, and I'm unsure what to write to ask it to spit out the sunset part.
I have tried putting in the 'path' on the inspector ( html>body>div#container<div#mapcontainer>div#info>div#infotable>div.infotd>span#sunset ) and this was to no avail either, even if I remove the > parts.


#13

you can export you flow following the instructions here
https://randomnerdtutorials.com/exporting-and-backing-up-your-node-red-nodes/


#14

[{"id":"48a16303.a5d7bc","type":"tab","label":"Flow 4","disabled":false,"info":""},{"id":"57d19d4a.deac34","type":"html","z":"48a16303.a5d7bc","name":"Time","property":"payload","outproperty":"payload","tag":"div.today_nowcard-temp","ret":"html","as":"single","x":550,"y":400,"wires":[["c0ac60b7.785f2"]]},{"id":"c0ac60b7.785f2","type":"debug","z":"48a16303.a5d7bc","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","x":690,"y":400,"wires":[]},{"id":"65c6f1ba.2859b8","type":"inject","z":"48a16303.a5d7bc","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":200,"y":400,"wires":[["26d88a80.ef4fb6"]]},{"id":"d150aa16.de2f","type":"inject","z":"48a16303.a5d7bc","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":200,"y":480,"wires":[["28b4f89.d765f88"]]},{"id":"28b4f89.d765f88","type":"http request","z":"48a16303.a5d7bc","name":"Solopgang","method":"GET","ret":"txt","url":"https://soltider.dk/loc/haderslev-6100-haderslev-danmark","tls":"","x":370,"y":480,"wires":[["fc2cc7bc.f670b"]]},{"id":"fc2cc7bc.f670b","type":"html","z":"48a16303.a5d7bc","name":"Tid","property":"payload","outproperty":"payload","tag":"<span id="sunrise">8:03","ret":"html","as":"single","x":530,"y":480,"wires":[["2bdce107.2508be"]]},{"id":"2bdce107.2508be","type":"debug","z":"48a16303.a5d7bc","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":690,"y":480,"wires":[]},{"id":"26d88a80.ef4fb6","type":"http request","z":"48a16303.a5d7bc","name":"Sunset","method":"GET","ret":"txt","url":"https://soltider.dk/loc/6100-haderslev-danmark#","tls":"","x":400,"y":400,"wires":[["57d19d4a.deac34"]]}]


#15

please read this post and then edit your flow


#16

I don't think you are understanding how CSS selectors work -- and from the description on the linked CSS Selectors page, frankly I'm not able to come up with a definitive answer, either!

However, having worked with css/html/jquery in the past, I would try something like this:
span[id="sunset"] or even #sunset...

That selector should locate a single span element with that id (and there should only be one within the html page, but that's not always true). Set the "Output" option to return "only the text content of the elements", and you should get the time string returned in your msg.payload.


#17

If you take a closer look on the payload of the site you can see that there isnt any value between the span tags:
<span id="sunset"></span>
So there is nothing to extract from.
There is a JSON in the payload containing the informations. If youre intrested how to extract elements here is an example how to do it with regex:

[
    {
        "id": "77d66f26.fc3e5",
        "type": "inject",
        "z": "4a74382c.bbd2d8",
        "name": "",
        "topic": "",
        "payload": "",
        "payloadType": "date",
        "repeat": "",
        "crontab": "",
        "once": true,
        "onceDelay": "2",
        "x": 150,
        "y": 280,
        "wires": [
            [
                "63688402.1085ec"
            ]
        ]
    },
    {
        "id": "45f4b138.b57d7",
        "type": "http request",
        "z": "4a74382c.bbd2d8",
        "name": "",
        "method": "GET",
        "ret": "txt",
        "url": "https://soltider.dk/loc/6100-haderslev-danmark#",
        "tls": "",
        "x": 510,
        "y": 280,
        "wires": [
            [
                "ceee31fd.9de24",
                "3497b59c.3513ca"
            ]
        ]
    },
    {
        "id": "2dafc463.2fd38c",
        "type": "debug",
        "z": "4a74382c.bbd2d8",
        "name": "",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "x": 950,
        "y": 280,
        "wires": []
    },
    {
        "id": "63688402.1085ec",
        "type": "change",
        "z": "4a74382c.bbd2d8",
        "name": "",
        "rules": [
            {
                "t": "set",
                "p": "headers",
                "pt": "msg",
                "to": "{\"User-Agent\":\"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0\"}",
                "tot": "json"
            }
        ],
        "action": "",
        "property": "",
        "from": "",
        "to": "",
        "reg": false,
        "x": 330,
        "y": 280,
        "wires": [
            [
                "45f4b138.b57d7"
            ]
        ]
    },
    {
        "id": "ceee31fd.9de24",
        "type": "change",
        "z": "4a74382c.bbd2d8",
        "name": "",
        "rules": [
            {
                "t": "set",
                "p": "payload",
                "pt": "msg",
                "to": "$replace(msg.payload, /([\\s\\S]+?<pre>\\s)([\\s\\S]+?)(<\\/pre>[\\s\\S]+)/, \"$2\")\t\t",
                "tot": "jsonata"
            }
        ],
        "action": "",
        "property": "",
        "from": "",
        "to": "",
        "reg": false,
        "x": 680,
        "y": 280,
        "wires": [
            [
                "dd68cea1.b403e"
            ]
        ]
    },
    {
        "id": "dd68cea1.b403e",
        "type": "json",
        "z": "4a74382c.bbd2d8",
        "name": "",
        "property": "payload",
        "action": "",
        "pretty": false,
        "x": 830,
        "y": 280,
        "wires": [
            [
                "2dafc463.2fd38c"
            ]
        ]
    }
]

Or you just go the easy way and extract the data from their api:
https://soltider.dk/api?lat=55.663426&lng=12.542953
:grinning:


#18

Hello ladies and gentlemen!
Terribly sorry for the hiatus on this thread, I didn't make it in time and as such took a small break from it before beginning again. Having tried #sunset, even span#sunset, #span#sunset, span[id="sunset"], span#[id="sunset"] I can say that I've done a fair bit and nothing is working.

I've asked one of my teachers for help whom can hopefully help me, and a classmate who has worked with a different website with the exact same method is having luck where I have none. Having copied his method I'm still getting no cooperation, which is odd to say the least. He uses this website (https://weather.com/weather/today/l/55.40,10.40?par=google), and uses div.today_nowcard-temp to locate it. I've done the exact same with mine, and still - no response.

Ironicly we've just begun working with Json in class with an assignment where I basicly have to do what I'm doing here, so I'll probably try do that way. I'm determined to finish this fashion now when I have more time, just so I know what I'm doing wrong.

Thank you all for the help, and any continued assistance is very much appreciated.


#19

One of the things I do when looking for data on a website is to go to the site and then with the developer tools of the browser, view the source of the page. I then copy that source to a text file so I can examine it to see where exactly the data I want is.

If you do that the site you are referring, you may find some interesting things that may help you and explain the results you have already seen (hint hint)


#20

Hey there Zeno!

I found my solution - Just go to a website where using the CSS selector worked. I haven't tried what you suggested, as in class my friend suggested I try the website they were using - And there it instantly worked. I now have the data, and I have it update with 12 hour intervals.

The problem is now solved, and this post can be considered ended. Thank you all for the help!