Node-red-contrib-google-action 2.0.0-alpha released

[This was posted in another old thread but buried in the responses.]

I've released a version 2.0.0-alpha of the Google Action node - most simple conversation types work but not the more complex ones. You can install it using:

npm install node-red-contrib-google-action@alpha

Working with the Google Assistant backend API is really painful - it's one of those API's that has been thrown together to support a variety of disparate user interface technology for a range of applications that haven't been fully designed as yet. It's really experimental.

This version of the Google Action node moves away from using the Actions SDK provided by Google because it was simply too incompatible with the Node Red framework - the Actions 2.0 SDK is written to act as a framework for an app in Node and smashing the two frameworks together was nasty.

Instead of using the Actions SDK, this version uses the Web hooks provided by Google. This means that there is a lot more flexibility available to you to implement your conversation flow in Node Red.

Note that this doesn't support Dialogflow projects because a) the dialog flow can be easily implemented in Node Red, b) there wasn't much to be gained from Dialogflow for apps with one or two users (Dialogflow uses machine learning to understand semantics of what people are saying, which requires lots of people to use the app), and c) Dialogflow won't accept self-signed SSL certificates which makes it difficult for hobbyist and experimental developers.

There are three nodes in the Google Action package - start , ask , and tell .

start is the starting point for a new conversation. It listens for new conversation requests from Google Assistant and outputs a msg for each new conversation. Behind the scenes, it is keeping track of each conversation as it flows through Node Red and sends subsequent requests from Google Assistant to the appropriate Ask node.

ask nodes prompt the user for input and represent a turn in a conversation. There are a range of prompt types available including Simple,Simple Selection, Date and Time, Confirmation, Address, Name and Location. The more complex ones are handled by Google Assistant - for example for Address the user can respond "McDonald's" and Google Assistant will automatically list all the nearby McDonald's for the user to select from, with the street address of the selection being returned to Node Red. The Ask has two outputs - first one is for the user's response, the second is for when the user cancels the conversation or doesn't respond before the timeout.

tell nodes tell the user something and terminate the conversation. All conversation paths should end with a Tell node, even ones that have been cancelled or timed out.

The Google Action nodes works with all Google Home and Assistant devices such as speakers, smart phones and hubs.

I really like the way this version allows you to integrate a conversation with a Node Red flow - the user conversation becomes part of the flow, rather than the flow being about handling the conversation.

Anyhow, have a play with it and give me any feedback.

Dean

2 Likes

Hey Dean,
Could you please give a bit more explanation of how the Google Home device communicates with the Node-RED flow (and the other way around). I see Actions SDK, webhooks, DialogFlow, Google Assistant backend API. But I cannot really how the entire traject looks like...

image

Some examples of things I'm struggling with:

  • Does the Ask node communicate directly to the Home device, or via the cloud service
  • Where are the Actions and Web hooks used for in this setup
  • Does the Node-RED flow only respond to requests from the cloud service, or does it also send requests to the cloud service?
  • Does the cloud service send a request to the Node-RED flow every time the Google Home device has received a speach-based question?
  • ...

Thanks !!!!
Bart

Hey Bart,

I couldn't reply without attempting to match your diagramming skills :slightly_smiling_face:


First to clear up some naming conventions:

Google refers to external app providers as fulfillment services - our Node Red flow in this case.

A conversation consists of a user initiation (talk to my test app), multiple prompt/response turns, and a completion which closes the conversation.

Initiation starts a new conversation by the user with the fulfillment service.

Prompts go to the user to request input.

Responses come from the user to supply information to the fulfilment service

Completion returns a response to the user and closes the conversation between the user and the fulfillment service.

Google Assistant is the entire Google infrastructure incorporating the Google Home devices, the Google Assistant phone app, the speech to text recognition, the Google Actions conversation handler, and Google Dialogflow conversation sequencer.

Google Actions handles conversations by keeping track of whose turn it is to speak in a conversation and communicating between user devices and the fulfillment service. It does a very basic interpretation of some responses such as date/time and location, but knows nothing about the conversation sequence.

Google Dialogflow performs lexical analysis on the user responses and sequences the steps in a conversation flow and may direct a conversation down different paths depending on context. We are not using Dialogflow here.

Each prompt/response in a conversation is a disconnected transaction, meaning that there is no persistent connection maintained between a prompt and the response. The response may be cancelled, time out, or just never return. Conversations are identified by a consistent conversationId across all transactions in a specific conversation. It is also possible to attached conversation state information to a prompt which will be returned by the next response.

Transactions between Google Actions and the fulfillment service are done using a webhook with JSON. There is a Actions SDK for Node but it isn't being used because it is a pain to work with. We are working with the JSON received.

Here a Node Red flow represents a conversation, which can have multiple paths and loops if required. Each conversation starts with a Start node and ends with a Tell node, and can have any number of Ask nodes in between. A conversation flow can have multiple paths and loops as required.

So a user initiates a conversation usually by saying 'Hey Google, talk to my test app'. This is received by the Start node and generates a new message in the flow. Typically the msg.payload will be empty,but it may contain a user request if the user says something like 'Hey Google, tell my test app to turn on the kitchen light'

In this diagram, this msg is passed to an Ask node which prompts the user for more information (a selection perhaps). Ask nodes can ask simple questions, provide suggested responses, and offer a list of options. It can also ask for specific types of information like date/time, and location using some of Google Assistant's built-in intelligence. Once the prompt is sent, the msg flow stops until a response is received. When the response is received, it is output from the response output of the Ask node. If the user cancels the conversation or doesn't respond within a timeout period, then a Cancel msg is output from the cancel output of the Ask node.

That response msg would typically pass through some sort of processing function like Ecolet and could trigger an action. In the diagram above the msg passes through Function A and then to another Ask node for a second prompt/response transaction.

The second response msg passes through Function B to a Tell node which returns a completion message to the user to close the conversation.

If a Cancel or Timeout is received, it is passed the the Cleanup function that cleans up any incomplete transaction before closing the conversation with a Tell node.

Multiple conversations could be occurring in a flow simultaneously and each can stall at any Ask node. In fact, conversations could overtake one another. This may need some special considerations but conversationId will help you keep track of what's what.

So, to specifically answer your questions:

  • Ask (and Tell) nodes communicate through the Google Assistant cloud service. Communication to the Google Assistant device is by a different API (Google Cast)

  • Behind the scenes there is a HTTP endpoint that receives the webhook request and passes it to either to the Start node for new conversations, or the Ask node that last sent a prompt in the existing conversation. Ask and Tell nodes use this endpoint to respond to the webhook request with a prompt or completion.

  • The Node Red flow is a fulfillment service and only responds to requests from the Google Assistant cloud service. The user side of the cloud service can receive raw speech requests (pretty useless unless you want your app to literally talk to Google Assistant - trigger routines perhaps?).

  • Yes, there is a request/response transaction between the Google Assistant cloud service and the Node Red flow for each time the user speaks. There is no persistent connection maintained between a prompt and the response in a conversation.

So, the big difference between V1 and V2 is that in V1 the flow was transaction based - every initiation or response triggered an entire flow and a conversation would involve multiple passes through the flow. In V2, the flow is conversation based and can pause within the flow whilst waiting for user a response.

1 Like

Hey Dean,
GREAT explanantion! Thanks for spending your time...

That is a talent that makes me famous across the globe :wink:
But when I look at your diagram, I have met the real competition now ...

So the Node-RED flow is used instead of DialogFlow?

Have you developed a custom webhook for Node-RED in your contribution (which you register somehow)? Or is that an out-of-the-box feature from Google?

I have no Google Home device, so cannot test yet. But am I correct that you can send a text 'Choose option A or B' into the Ask node, that the user will hear this sentence as a voice signal?

So I assume that this is pretty good secured? I mean someone else cannot send http requests (containing textual commands) to the endpoint to turn off my heating?

Ah I thought that the ecolet functionality was part of the Google cloud solution? So our input is a textual representation of the speech signal, without any preprocessing done by Google? Whatever you tell to the Home device will arrive 100% identical as a string in the output message of the Start node?

Bart

I'll warn you that I did use to be a professional diagrammer

That's right. Firstly, there are some complications to using Dialogflow which make it a bit of effort for not much gain. Secondly, it is more powerful to use Node Red to control the conversation flow as it can use other inputs as conditions. For example, you could set up a 'good morning' conversation that was different depending on the local weather.

No, it is the official Google webhook that uses JSON to pass transaction requests and responses. It's all documented (in a roundabout way) on Google's developer website.

Yes, so Ask nodes are for asking questions and Tell nodes are for telling results or messages. Essentially, Ask nodes say a prompt and wait for a response whereas Tell nodes say a message then close the conversation with the user.

Don't forget, this works with Google Assistant on your smart phone and there is a developer test console with a web interface.

Nope, not in the slightest (though there is a way of checking that a request has come from Google - I should probably implement that).

First of all, Google Assistant apps are generally meant to be open access so that anyone can use them. There is no built in security from Google's side.

There are some restrictions though. First, only devices linked to your Google account can invoke your test app, so it isn't open to everyone. However, anyone using one of your devices can issue commands to your server. And someone else could set up their own test app to point to your server.

Google Smart Home does have security built in, but that is a whole different ball game of external credential validation and authorization.

The ecolet functionality is part of what is provided by Dialogflow. Whatever the user says is translated to text and sent as is to the node. Note: only conversation initiations come out of Start nodes - later responses come out of the Ask node that prompted for the input.

There are some exceptions though with some prompt types. The Date and Time prompt will let the user say something like 'Tuesday next week at 4pm' and return the actually date and time. The Address prompt will allow the user to respond relative to their location. For example, they could say 'McDonalds' and Google will return the street address of the nearest McDonalds to the user's current location. The Name and Location prompt will ask the user for permission to query their phone for their current location and return that along with their registered name.

One other thing - this isn't purely speech based. Using Google Assistant on your phone or Google Hub allows you to type responses or select menu items, as well as speak.

If you are able to do that, that would be a nice addition!

Ok, so I can test your nodes without a Home device ...

P.S. Perhaps you could add a link in your readme file to your explanation above! Love your diagram (without crossing wires) !!!

Dean, I did use v1 for a while, but found having to first announce 'use my test app' a pain, even if I used an alias.
I don't have my flow any longer to test, but now that 'custom routines' have been introduced by Google, I wondered if it would be possible to start interactions with a routine, which prefixed 'use my test app' first, and then followed by the actual command?

Yes you can. I have a routine that shortcuts 'node red' to 'talk to my test app'. You can also give the command in the initiation such as 'Hey Google,tell my test app to turn on the kitchen lights'.

You still get Google rambling on about 'Alright, getting the test version of my test app'

2 Likes

I've just noticed that you can now deploy your Google Assistant app in alpha test mode which does not require it to be reviewed and approved by Google before hand. This is done through the Google Assistant developer console.

Alpha test mode lets you set a name for your app instead of using 'My Test App'. This is not as easy as it sounds as you cannot use trademark names, the name must be two or more words, and it must be unique in the Google Assistant ecosystem. So 'Node Red' is a trademark and most variations of 'home control' are taken.

In alpha test mode,you can invite up to 20 other users to access your app.

There are a few configuration items available including the type of voice, background colours, and icons.

1 Like

I trying to find were you can update to alpha test I can't find it

npm install node-red-contrib-google-action@alpha

or switch to the no-api branch on Github

In the Google Assistant console,it is under Release

Hey Dean (@DeanC),

Hope you had a nice Christmas!
Got a Google Home device as a present, so I'm now trying to connect it to my Node-RED flow.
However I have some questions about your readme file.

  • I have implemented port forwarding (to port 8081) on my router and setup dynamicdns
  • Now I need to create a keystore and a self-signed certificate. I assume with openssl. Could you please add example command to the readme page, to make sure we do it correctly?
  • Is it also possible to add a certificate to the Node-RED keystore and use that in your Start node. Or is that not secure for some reason?
  • What do I need to add in the URL field?
    image
  • I assume 'Use SSL' should always be checked, since you say that Google Actions requires SSL?
  • About the both SSL file fields. Aren't the private key and the certificate both stored in the same keystore, so we only need to specify a single file path ??
  • If you would have some time left, could you please update the example to use your new nodes?
  • What is the best way to protect the port 8081 for unwanted requests, not send by Google?
  • You say we cannot run private apps. Is the only disadvantage that we need to specify a worldwide unique name?

Thanks a lot !!!
Bart

I have mine in /home/pi/.node-red/ssl/mykey.key and have that url in the config.

That is up to you, you can call it what you wish so long as you've entered the same 'fulfillment url' in Google actions. For example, I use something similar to https://digitalnut.co.uk:8443/gg3265 and have /gg3265 in the config.

That depends where you have stored them! most people would put both in the same folder, but having 2 config fields for them allows users the flexibility to use whatever names & paths that they wish, not necessarily certificate.key, could be mykey.key

1 Like

Paul,

My second and third question aren't relevant anymore. I was confusing this with keystores in Java, where both private keys and certificates were stored in the same secret keystore (and you can import/export from that store). Therefore I thought to import the Google Action related certificate in the 'same' keystore as my Node-RED certificates ...

But here indeed we are working with separate files, similar to standard Node-RED SSL settings:

image

And then of course it completely makes sense to also have two separate files for the Google Actions, and you need to specify of course both file paths in the node's config screen.

So let's act like I have never asked those two questions ...

What questions?? :sunglasses:

1 Like

Still no luck. The gactions test ... command keeps showing:

Pushing the app for the Assistant for testing...
ERROR: Failed to test the app for the Assistant
ERROR: The caller does not have permission
2018/12/27 00:33:34 Server did not return HTTP 200

This is what I have done:

  1. Have created a private key (in the $HOME/.node-red) using this command:

    openssl genrsa -out google-actions-key.pem 2048
    
  2. Have create a certificate signing request (in the $HOME/.node-red) using this command:

    openssl req -new -sha256 -key google-actions-key.pem -out google-actions-csr.pem
    

    Remark: for common name I have entered the dynamic dns name (e.g. xxxxx.duckdns.org) of my router.

  3. Have created a self signed certificate (in the $HOME/.node-red) using this command:

    openssl x509 -req -in google-actions-csr.pem -signkey google-actions-key.pem -out google-actions-cert.pem
    
  4. The result is this:
    image

  5. Have added both paths in the config screen of my google-actions node:
    image

  6. When I navigate to port 8081 of my Raspberry, I get this:
    image
    I assume this is normal, since I don't POST any data to it. But at least the SSL connection has been setup correctly, and I can see in the browser the self-signed certificate that I have setup above ...

  7. On my router I have setup port-forwarding of router port 8081 to port 8081 of my Raspberry. I did a similar test as in the previous point, but now with https://<wan ip address of my router>:8081. The result was the same.

  8. And to test my dynamic dns I did a similar test as in the previous point, but now with https://<dynamic dns name of my router>:8081. The result was again the same.

  9. Have added the https://<dynamic dns name of my router>:8081 url in my action.json file. Remark: I haven't changed anything else in the action.json file!

  10. Have created a project on https://console.actions.google.com/ with name <my project name>.

  11. And then I call gactions test -preview_mins 9999999 -action_package action.json -project <my project name>, which fails ...

Anybody has an idea of what I have forgotten???
Bart

In your node config, your URL path is set to /
Isn't that the same URL path as your node-RED editor?
i.e. have you tried adding a URL sub directory, such as in my example above - /gg3265

Hi, thanks for the nodes :slight_smile: For the last few days I have been looking into google actions as well. Could you please provide a node that instead of an HTTP endpoint it could just accept a payload via input from another source?
I would want to use my other node-red-contrib-webhookrelay node so I don't have to expose nodered to the internet.

Paul (@Paul-Reed) , I have now added a subfolder, but same problem remains...

But when I navigate with my browser to https://<dynamic dns name of my router>:8081/<sub folder> then I see the error appearing in my Putty console:

So in that case I arrive at least in Dean's code. However when I run the gactions test ... command there appears nothing in my Putty console, so I assume Google even doesn't call my router? Therefore I think there is something wrong in my Google setup, since he doesn't know how to contact my router?

I have done only a few things to setup the link between Google Actions and my private installation:

  • Have created on https://console.actions.google.com/ a new project with name <my project name>:

    image

    I have not checked whether this name is unique (i.e. not used yet by other users), since I assume that Google will give an error if not unique ??? And I have NOTHING else setup here, since it looks like all the settings need to be done in the actions.json file (see next step)? Is that correct?

  • I have copied Deans action.json file to the same folder where the gactions.exe program is located, and only changed the url:

    image

    But I have not defined the nodeRedApp somewhere else, only in this file?

  • And then I run the command, with my project name at the end:

    gactions test -preview_mins 9999999 -action_package action.json -project <my project name>
    

    Which results in this:

    Pushing the app for the Assistant for testing...
    ERROR: Failed to test the app for the Assistant
    ERROR: The caller does not have permission
    2018/12/27 13:32:34 Server did not return HTTP 200

Is there perhaps anything else I should setup?

Hey Karolis, interesting question! I saw that @DeanC uses a separate ExpressJs webserver (instead of e.g. standard Node-RED HttpIn nodes), since those HttpIn nodes cannot use a different port number (as the rest of Node-RED). And he doesn't want to expose the rest of Node-RED to the internet, only for exposing the Action listeners. Is there perhaps a similar security risc in your proposal?

Bart

I haven't tried Dean's latest version, but did have the previous version working ok for several months.
If I get chance this evening I'll grab the alpha version, and try myself.