Simple web scrape help needed

I can retrieve the web page (confirmed via debug node) but I'm not smart enough to understand what to put in my html node's selector field to scrape the "state=" value from my digital loggers web power switch. Here's a sample of the returned html, I'm looking for the "state=" data right after /head

<html>
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

<META HTTP-EQUIV="Refresh" CONTENT="300">
<title>Outlet Control  - sTower pSwitch</title>
<script language="JavaScript">
<!--
function reg() {
window.open('http://www.digital-loggers.com/register.html?SN=LPC721560');
}
//-->
</script>
</head>
<!-- state=0f lock=00 -->

<body alink="#0000FF" vlink="#0000FF">
<FONT FACE="Arial, Helvetica, Sans-Serif">
<table width="100%" cellspacing=0 cellpadding=0>
<tr>
<td valign=top width="17%" height="100%">

    <table width="100%" height="100%" align=center border=0 cellspacing=1 cellpadding=0>
    <tr><td valign=top bgcolor="#F4F4F4">
    <table width="100%" cellpadding=1 cellspacing=5>

    <tr><td align=center>
    
    <table><tr><td><a href="http://www.digital-loggers.com/1T.html"><img src="logo.gif" width=195 height=65 border=0 alt="Digital Loggers, Inc."></a></td>
    
    <td><b><font size=-1>Ethernet Power Controller</font></b></td></tr></table>
    <hr>
    </td></tr>



<tr><td nowrap><b><a href="/index.htm">Outlet Control</a></b></td></tr>
<tr><td nowrap><b><a href="/admin.htm">Setup</a></b></td></tr>
<tr><td nowrap><b><a href="/script.htm">Scripting</a></b></td></tr>

It's the hex value of my power switch's outlets statuses.

I am not sure you can easily extract that as HTML since it is in a comment. Output the scrape as text and process it that way.

In a change node, use a regex. Something like state=(.*) lock Should give you the hex value in \1 (or is that $1 I can never remember the substitution codes between different systems.

Pardon my ignorance, I don't see the regex option in my change node.

I appreciate the help but I'm still struggling to figure out how to save that hex value to a new property to further process down the line. My change node is replacing state=0f lock with 0f but that doesn't help me yet. I tried a 2nd rule to "set" msg.state to "$1" but that doesn't work.

Capture

Why do you need the output on a different property than payload? Do you want to keep the original HMTL? If not then you don't need anything else. Simply add the next node to do whatever you want with the value.

If you wanted to keep the value in a global or flow variable. You just need to add another change entry in the same node after the first one that takes the payload as input and writes it to a variable.

As is I still have most of the html code in the payload. All that changes is <!-- state=1f lock=00 --> becomes <!-- 1f=00 -->

<html>
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

<META HTTP-EQUIV="Refresh" CONTENT="300">
<title>Outlet Control  - sTower pSwitch</title>
<script language="JavaScript">
<!--
function reg() {
window.open('http://www.digital-loggers.com/register.html?SN=LPC721560');
}
//-->
</script>
</head>
<!-- 1f=00 -->

<body alink="#0000FF" vlink="#0000FF">
<FONT FACE="Arial, Helvetica, Sans-Serif">
<table width="100%" cellspacing=0 cellpadding=0>
<tr>

Where you expecting msg.payload to only hold the hex data? See my previous post.

If I'm being honest, I'm not sure what you are looking for. I've shown you how to extract the data, now you need to move on to using it how you want.

I want to get that value "1f" isolated into a msg property so that I can convert it to binary and set mqtt topics accordingly. The regex is catching the value but the change node is only replacing the matched regex with the value, leaving all the rest of the html in msg.payload. See my html snippet earlier.

[{"id":"f734492d.6f89d8","type":"inject","z":"acab87b6.abe428","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"test test <!-- state=1f lock=00 --> hello world","payloadType":"str","x":80,"y":700,"wires":[["fdc1f0fe.52af98"]]},{"id":"fdc1f0fe.52af98","type":"change","z":"acab87b6.abe428","name":"","rules":[{"t":"change","p":"payload","pt":"msg","from":"state=(.*) lock","fromt":"re","to":"$1","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":340,"y":700,"wires":[["4c2e2457.420e84"]]},{"id":"4c2e2457.420e84","type":"debug","z":"acab87b6.abe428","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":550,"y":700,"wires":[]}]

Try this. I think the regex is working but msg.payload still contains all the other text. I think I'll just use the your regex in a function node, assign the extracted value to a var and away I go.

You could use a function node;

m = msg.payload
match = m.match(/state=(.*?) lock/) 
msg.match = match[1]
delete msg.payload
return msg;

returns:

{"_msgid":"4ed4950b.d90afc","topic":"","match":"0f"}

flow

[{"id":"55c8d417.443434","type":"inject","z":"c1e3e437.d45e1","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"<html> <head> <META NAME=\"ROBOTS\" CONTENT=\"NOINDEX, NOFOLLOW\">  <META HTTP-EQUIV=\"Refresh\" CONTENT=\"300\"> <title>Outlet Control  - sTower pSwitch</title> <script language=\"JavaScript\"> <!-- function reg() { window.open('http://www.digital-loggers.com/register.html?SN=LPC721560'); } //--> </script> </head> <!-- state=0f lock=00 -->  <body alink=\"#0000FF\" vlink=\"#0000FF\"> <FONT FACE=\"Arial, Helvetica, Sans-Serif\"> <table width=\"100%\" cellspacing=0 cellpadding=0> <tr> <td valign=top width=\"17%\" height=\"100%\">      <table width=\"100%\" height=\"100%\" align=center border=0 cellspacing=1 cellpadding=0>     <tr><td valign=top bgcolor=\"#F4F4F4\">     <table width=\"100%\" cellpadding=1 cellspacing=5>      <tr><td align=center>          <table><tr><td><a href=\"http://www.digital-loggers.com/1T.html\"><img src=\"logo.gif\" width=195 height=65 border=0 alt=\"Digital Loggers, Inc.\"></a></td>          <td><b><font size=-1>Ethernet Power Controller</font></b></td></tr></table>     <hr>     </td></tr>    <tr><td nowrap><b><a href=\"/index.htm\">Outlet Control</a></b></td></tr> <tr><td nowrap><b><a href=\"/admin.htm\">Setup</a></b></td></tr> <tr><td nowrap><b><a href=\"/script.htm\">Scripting</a></b></td></tr>","payloadType":"str","x":288,"y":384,"wires":[["ab6af4e6.b71b5"]]},{"id":"ab6af4e6.b71b5","type":"function","z":"c1e3e437.d45e1","name":"","func":"m = msg.payload\nmatch = m.match(/state=(.*?) lock/) \nmsg.match = match[1]\ndelete msg.payload\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":420,"y":384,"wires":[["a3ca3f69.888b98"]]},{"id":"a3ca3f69.888b98","type":"debug","z":"c1e3e437.d45e1","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":554,"y":384,"wires":[]}]

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.