Try to scrape a website, that contain only one table

Hello there, I try to scrape a very basic site (it only a table at all)

https://www.met.hu/idojaras/veszelyjelzes/hover.php?id=wbhx&kod=18

I use this node: https://flows.nodered.org/node/node-red-contrib-scrape-it

it is my code:

[{"id":"ca5f692a.2854f8","type":"http request","z":"492c08b5.43dc68","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"https://www.met.hu/idojaras/veszelyjelzes/hover.php?id=wbhx&kod=14","tls":"","persist":false,"proxy":"","authType":"","x":490,"y":1120,"wires":[["c85c33ed.77265"]]},{"id":"89e2d6f9.a1ee78","type":"inject","z":"492c08b5.43dc68","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":260,"y":1120,"wires":[["ca5f692a.2854f8"]]},{"id":"93928a08.f8b2f8","type":"debug","z":"492c08b5.43dc68","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1170,"y":1140,"wires":[]},{"id":"c85c33ed.77265","type":"scrape","z":"492c08b5.43dc68","name":"","mapping":"{\"data\":{\"listItem\":\"table tr td\"}}","x":790,"y":1220,"wires":[["93928a08.f8b2f8"]]}]

My goal is a similar output:

{
    "Zivatar": [
        {
            "link1": "/images/warningb/ts1.gif"
        },
        {
            "link2": "/images/warningb/w1.gif"
        }
    ],
    "Felhőszakadás": [
        {
            "link1": "/images/warningb/ts1.gif"
        },
        {
            "link2": "/images/warningb/w1.gif"
        }
    ]
}

So the object name is the third columnm, the first and second column is the two value in the object. and all new line in table is a new object. It possible to do this somehow? I'm a very basic programming knowlege....

Why? As it is simple html, just use the built in nodes. A HTTP Request node and the HTML node to select the items of interest.

@Steve-Mcl

Hmm it is true, now it is better!

[{"id":"ca5f692a.2854f8","type":"http request","z":"492c08b5.43dc68","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"https://www.met.hu/idojaras/veszelyjelzes/hover.php?id=wbhx&kod=14","tls":"","persist":false,"proxy":"","authType":"","x":490,"y":1120,"wires":[["b580bb0a.2e56a8"]]},{"id":"89e2d6f9.a1ee78","type":"inject","z":"492c08b5.43dc68","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":260,"y":1120,"wires":[["ca5f692a.2854f8"]]},{"id":"93928a08.f8b2f8","type":"debug","z":"492c08b5.43dc68","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1170,"y":1140,"wires":[]},{"id":"b580bb0a.2e56a8","type":"html","z":"492c08b5.43dc68","name":"","property":"payload","outproperty":"payload","tag":"table tr td","ret":"html","as":"single","x":850,"y":1140,"wires":[["93928a08.f8b2f8"]]}]

Any idea how to do that i want? (third columnm the object name)

And more than one line, like this.

		<table cellspacing=1 cellpadding=0 class='stbl' width='100%'>
		<tr><th colspan=3>Somogy megye</th></tr>
		<tr>
			<td class='row1' width=31><img src='/images/warningb/ts1.gif' border=0></td>
			<td class='row1' width=27><img src='/images/warningb/w1.gif' border=0></td>
			<td class='row1' width=170>Zivatar</td>
		</tr>
		<tr>
			<td class='row1' width=31><img src='/images/warningb/tr1.gif' border=0></td>
			<td class='row1' width=27><img src='/images/warningb/t1.gif' border=0></td>
			<td class='row1' width=170>Felhőszakadás</td>
		</tr>
	</table>
	<div class='kt-friss'><div>Kiadva: 2021-05-27 09:37 (07:37 UTC)</div><div>[wbhx]</div></div>

I think that i search in array a specific word and the output the result AND the two other value that come BEFORE the searched word.

The search part is easy, but i dont know how to add only two element that come before the searched word.

I think i solve it (dirty :slight_smile: )

[{"id":"89e2d6f9.a1ee78","type":"inject","z":"492c08b5.43dc68","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"\t\t<table cellspacing=1 cellpadding=0 class='stbl' width='100%'> \t\t<tr><th colspan=3>Somogy megye</th></tr> \t\t<tr> \t\t\t<td class='row1' width=31><img src='/images/warningb/ts1.gif' border=0></td> \t\t\t<td class='row1' width=27><img src='/images/warningb/w1.gif' border=0></td> \t\t\t<td class='row1' width=170>Zivatar</td> \t\t</tr> \t\t<tr> \t\t\t<td class='row1' width=31><img src='/images/warningb/tr1.gif' border=0></td> \t\t\t<td class='row1' width=27><img src='/images/warningb/t1.gif' border=0></td> \t\t\t<td class='row1' width=170>Felhoszakadas</td> \t\t</tr> \t</table> \t<div class='kt-friss'><div>Kiadva: 2021-05-27 09:37 (07:37 UTC)</div><div>[wbhx]</div></div>","payloadType":"str","x":250,"y":1120,"wires":[["b580bb0a.2e56a8"]]},{"id":"93928a08.f8b2f8","type":"debug","z":"492c08b5.43dc68","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1310,"y":1120,"wires":[]},{"id":"b580bb0a.2e56a8","type":"html","z":"492c08b5.43dc68","name":"","property":"payload","outproperty":"payload","tag":"table tr td","ret":"html","as":"single","x":700,"y":1120,"wires":[["aae312d8.bf4f3"]]},{"id":"aae312d8.bf4f3","type":"split","z":"492c08b5.43dc68","name":"","splt":"\\n","spltType":"str","arraySplt":"3","arraySpltType":"len","stream":false,"addname":"","x":930,"y":1120,"wires":[["c50ee6c8.34c378","f371f4bf.dd7e18"]]},{"id":"c50ee6c8.34c378","type":"switch","z":"492c08b5.43dc68","name":"","property":"payload[2]","propertyType":"msg","rules":[{"t":"eq","v":"Zivatar","vt":"str"}],"checkall":"true","repair":false,"outputs":1,"x":1090,"y":1120,"wires":[["93928a08.f8b2f8"]]},{"id":"f371f4bf.dd7e18","type":"switch","z":"492c08b5.43dc68","name":"","property":"payload[2]","propertyType":"msg","rules":[{"t":"eq","v":"Felhoszakadas","vt":"str"}],"checkall":"true","repair":false,"outputs":1,"x":1090,"y":1200,"wires":[["e581378a.de53f8"]]},{"id":"e581378a.de53f8","type":"debug","z":"492c08b5.43dc68","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1310,"y":1200,"wires":[]}]

Open the page in a browser and use the dev tools to examine the page, they will tell you the selector you need to use to find a specific element.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.