HTML Links (Table) from website to CSV or Excel

Hello,

Need some help to convert the table on this site: https://www.sounds-venlo.nl/verwachte-releases/?filter-type[]=vinyl to a CSV or Excel file.

Tried some things with the html node, but no result to retrieve the full table in a simple way.

Thanks.

One issue is that, though it LOOKS like a table, it really isn't.

It is a set of links containing some spans. And the spans are defined as display: bock :person_shrugging: A bit bonkers.

Your best bet is to extract the innerHTML from the <section> tag and then try to process it manually.

There is no simple way I don't think.

Unless you really have to do it in Node Red - i would suggest having a look at Power Query and the get data function in the newer versions of excel - it is a very easy and clean process to extract something into a useable excel file

Craig

I don't see how using PowerQuery would be any easier.

It is a matter of walking through the HTML, converting to something more useful.

This flow will produce csv output, which can be saved to a file.

It is not that hard. First get all ahrefs with an html node, which produces an array, use a split node and use another html node to get all spans, join them back into an array and pass it through a csv node.

[{"id":"e5c6200e558a48ac","type":"inject","z":"97d5eaac17934f34","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":180,"y":280,"wires":[["b596b0e309430a73"]]},{"id":"b596b0e309430a73","type":"http request","z":"97d5eaac17934f34","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"https://www.sounds-venlo.nl/verwachte-releases/?filter-type%5B%5D=vinyl","tls":"","persist":false,"proxy":"","insecureHTTPParser":false,"authType":"","senderr":false,"headers":[],"x":350,"y":280,"wires":[["3629ebe64a3b8d18"]]},{"id":"3629ebe64a3b8d18","type":"html","z":"97d5eaac17934f34","name":"","property":"payload","outproperty":"payload","tag":"body > main > section > a","ret":"html","as":"single","x":570,"y":280,"wires":[["7d336be5d72981d9"]]},{"id":"5051aa968880e841","type":"debug","z":"97d5eaac17934f34","name":"debug 595","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1270,"y":280,"wires":[]},{"id":"466df08a13a86333","type":"html","z":"97d5eaac17934f34","name":"","property":"payload","outproperty":"payload","tag":"span","ret":"text","as":"single","x":870,"y":280,"wires":[["9a694c49dda06ad9"]]},{"id":"7d336be5d72981d9","type":"split","z":"97d5eaac17934f34","name":"","splt":"\\n","spltType":"str","arraySplt":1,"arraySpltType":"len","stream":false,"addname":"","x":750,"y":280,"wires":[["466df08a13a86333"]]},{"id":"9a694c49dda06ad9","type":"join","z":"97d5eaac17934f34","name":"","mode":"auto","build":"object","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","accumulate":"false","timeout":"","count":"","reduceRight":false,"x":990,"y":280,"wires":[["cd45a515197e7dde"]]},{"id":"cd45a515197e7dde","type":"csv","z":"97d5eaac17934f34","name":"","sep":",","hdrin":"","hdrout":"all","multi":"one","ret":"\\n","temp":"title,recordType,released,price","skip":"0","strings":true,"include_empty_strings":"","include_null_values":"","x":1110,"y":280,"wires":[["5051aa968880e841"]]}]
1 Like

Well Excel sucked it in in one go - no work - just told PQ to retrieve the URL

SO if the ultimate result is to get it to CSV looks like a pretty quick way to do so

Craig

1 Like

just told PQ to retrieve the URL

yeah excel / powerbi with powerquery works wonders, quite a strong/powerful parser

Yeah - my wife (accountant) is particularly liking the ability to parse PDFs straight in

Craig

1 Like

It's great when it works! I use PowerQuery a LOT at work to deal with large data CSV's. Didn't think to try sucking that page direct into Excel - good call.

Really great, works like a charm!

Is indeed simple and works too, but since I want to schedule this import/export on weekly basis, I will go for the bakman2 solution. However, this tip regarding the Excel option can be very handy in the future.

Although you have a working solution - if you look at Power Bi and Power Automate (both for Windows) you can schedule tasks (such as opening excel workbooks and updating them)

Craig

Assuming you have appropriate licensing of course. :frowning:

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.