How do you check HTML as multiple messages?

#1

Hi, I am trying to process data from supremenewyork.com/shop. I have this chain so far. HTTP input > HTTP request > HTML > HTML > join > HTTP response. the HTTP request gets the wanted site, the first HTML gets the HTML of the shop-scroller in multiple messages (there's only one tag with that ID, but one message would give me an array and I've had trouble with that) and the second HTML takes the HTML of all li tags out of it. What I'm left with looks like this:
<a href="/shop/shoes/ebnculszx"><span class="new_item_tag">new</span><img alt="" src="//assets.supremenewyork.com/161190/ma/e1TDTZjghDU.png" width="49" height="360"></a>
there are about 30 of these and I need every one of them as a separate message so that I can check each one separately. That's where I run into a problem. I need to check that the link contains a span using switch, but I only get one output, even though the span element is in about 1/3 of the links. How do I output all the links with the span element in it?

0 Likes

#2

If you show us the config of the Switch that is not working as you expect then we may be able to help.

0 Likes

#3

I have already deleted the node and I'm trying to do it using JS and joining the string first, but it was configured like this:


also (missing from the image) checking all rules and unchecked recreate message sequences.

0 Likes

#4

That should be ok. But you say you sent it number of messages but only the first message got routed on and others, even though they contained "span" were dropped?
That doesn't make sense.

0 Likes

#5

I started with this setup:
[{"id":"59cc9c97.a90eec","type":"http in","z":"2256f4fb.214d54","name":"","url":"/supreme","method":"get","upload":false,"swaggerDoc":"","x":120,"y":40,"wires":[["c98a81d1.aae12"]]},{"id":"d4577866.c7b788","type":"http response","z":"2256f4fb.214d54","name":"","statusCode":"","headers":{},"x":90,"y":360,"wires":[]},{"id":"c98a81d1.aae12","type":"http request","z":"2256f4fb.214d54","name":"","method":"GET","ret":"txt","url":"https://www.supremenewyork.com/shop","tls":"","x":110,"y":80,"wires":[["30487710.c7041"]]},{"id":"30487710.c7041","type":"html","z":"2256f4fb.214d54","name":"","property":"payload","outproperty":"payload","tag":"#shop-scroller","ret":"html","as":"multi","x":120,"y":120,"wires":[["7fc5b91c.f10378"]]},{"id":"7fc5b91c.f10378","type":"html","z":"2256f4fb.214d54","name":"","property":"payload","outproperty":"payload","tag":"li","ret":"html","as":"multi","x":90,"y":160,"wires":[["48fe7f5a.b9b35"]]},{"id":"7b1040ab.121d28","type":"join","z":"2256f4fb.214d54","name":"","mode":"auto","build":"string","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","accumulate":"false","timeout":"","count":"","reduceRight":false,"x":90,"y":320,"wires":[["d4577866.c7b788"]]},{"id":"67c0b78.874bcc8","type":"html","z":"2256f4fb.214d54","name":"","property":"payload","outproperty":"payload","tag":"a","ret":"attr","as":"multi","x":90,"y":240,"wires":[["9e1351f5.259b4"]]},{"id":"9e1351f5.259b4","type":"yaml","z":"2256f4fb.214d54","property":"payload","name":"","x":90,"y":280,"wires":[["7b1040ab.121d28"]]},{"id":"48fe7f5a.b9b35","type":"switch","z":"2256f4fb.214d54","name":"","property":"payload","propertyType":"msg","rules":[{"t":"cont","v":"span","vt":"str"}],"checkall":"true","repair":false,"outputs":1,"x":90,"y":200,"wires":[["67c0b78.874bcc8"]]}]

It's the same as the one mentioned in the original comment, but at a later stage, where it already outputs the href attribute I wanted.

0 Likes

#6

What matters is what was going into the Switch node, I don't need to see the rest of the flow. Put a debug node in to see and check that it is a sequence of separate messages some with and some without span.

0 Likes

#7

what should I set it to output?

0 Likes

#8

Just looking back at your question, are you actually asking how to split the string into multiple messages? If so you should be able to use the Split node to do that.

0 Likes

#9

probably a bad choice of words. I have:
multiple strings outputted from the HTML node
I need:
to let through only the ones that have a span tag in them

0 Likes

#10

out of the debug node I get strings like this:
<a href="/shop/tops-sweaters/x4qs8wlbi"><span class="new_item_tag">new</span><img alt="" width="49" height="360" src="//assets.supremenewyork.com/161162/ma/hqOU6k_J-p8.png"/></a> along with a few Can't set headers after they are sent errors coming from the HTTP response node. It's as if msg.parts were lost after filtering with the switch node.

0 Likes

#11

When you say multiple strings, do you mean multiple messages each containing a string in the payload? In which case the switch node should work.

0 Likes

#12

have you actually tried it? I posted the exported text earlier.

0 Likes

#13

No, I haven't got time to mess about importing the flow and seeing if I can make it go. There should be no need. If you want me to help then I am afraid you have to do it my way, by answering questions. If you don't want me to help then that is fine.

1 Like