Node-red-contrib-google-action 2.0.0-alpha released


#1

[This was posted in another old thread but buried in the responses.]

I've released a version 2.0.0-alpha of the Google Action node - most simple conversation types work but not the more complex ones. You can install it using:

npm install node-red-contrib-google-action@alpha

Working with the Google Assistant backend API is really painful - it's one of those API's that has been thrown together to support a variety of disparate user interface technology for a range of applications that haven't been fully designed as yet. It's really experimental.

This version of the Google Action node moves away from using the Actions SDK provided by Google because it was simply too incompatible with the Node Red framework - the Actions 2.0 SDK is written to act as a framework for an app in Node and smashing the two frameworks together was nasty.

Instead of using the Actions SDK, this version uses the Web hooks provided by Google. This means that there is a lot more flexibility available to you to implement your conversation flow in Node Red.

Note that this doesn't support Dialogflow projects because a) the dialog flow can be easily implemented in Node Red, b) there wasn't much to be gained from Dialogflow for apps with one or two users (Dialogflow uses machine learning to understand semantics of what people are saying, which requires lots of people to use the app), and c) Dialogflow won't accept self-signed SSL certificates which makes it difficult for hobbyist and experimental developers.

There are three nodes in the Google Action package - start , ask , and tell .

start is the starting point for a new conversation. It listens for new conversation requests from Google Assistant and outputs a msg for each new conversation. Behind the scenes, it is keeping track of each conversation as it flows through Node Red and sends subsequent requests from Google Assistant to the appropriate Ask node.

ask nodes prompt the user for input and represent a turn in a conversation. There are a range of prompt types available including Simple,Simple Selection, Date and Time, Confirmation, Address, Name and Location. The more complex ones are handled by Google Assistant - for example for Address the user can respond "McDonald's" and Google Assistant will automatically list all the nearby McDonald's for the user to select from, with the street address of the selection being returned to Node Red. The Ask has two outputs - first one is for the user's response, the second is for when the user cancels the conversation or doesn't respond before the timeout.

tell nodes tell the user something and terminate the conversation. All conversation paths should end with a Tell node, even ones that have been cancelled or timed out.

The Google Action nodes works with all Google Home and Assistant devices such as speakers, smart phones and hubs.

I really like the way this version allows you to integrate a conversation with a Node Red flow - the user conversation becomes part of the flow, rather than the flow being about handling the conversation.

Anyhow, have a play with it and give me any feedback.

Dean


Ways to control Node-RED with speech commands
#2

Hey Dean,
Could you please give a bit more explanation of how the Google Home device communicates with the Node-RED flow (and the other way around). I see Actions SDK, webhooks, DialogFlow, Google Assistant backend API. But I cannot really how the entire traject looks like...

image

Some examples of things I'm struggling with:

  • Does the Ask node communicate directly to the Home device, or via the cloud service
  • Where are the Actions and Web hooks used for in this setup
  • Does the Node-RED flow only respond to requests from the cloud service, or does it also send requests to the cloud service?
  • Does the cloud service send a request to the Node-RED flow every time the Google Home device has received a speach-based question?
  • ...

Thanks !!!!
Bart


#3

Hey Bart,

I couldn't reply without attempting to match your diagramming skills :slightly_smiling_face:


First to clear up some naming conventions:

Google refers to external app providers as fulfillment services - our Node Red flow in this case.

A conversation consists of a user initiation (talk to my test app), multiple prompt/response turns, and a completion which closes the conversation.

Initiation starts a new conversation by the user with the fulfillment service.

Prompts go to the user to request input.

Responses come from the user to supply information to the fulfilment service

Completion returns a response to the user and closes the conversation between the user and the fulfillment service.

Google Assistant is the entire Google infrastructure incorporating the Google Home devices, the Google Assistant phone app, the speech to text recognition, the Google Actions conversation handler, and Google Dialogflow conversation sequencer.

Google Actions handles conversations by keeping track of whose turn it is to speak in a conversation and communicating between user devices and the fulfillment service. It does a very basic interpretation of some responses such as date/time and location, but knows nothing about the conversation sequence.

Google Dialogflow performs lexical analysis on the user responses and sequences the steps in a conversation flow and may direct a conversation down different paths depending on context. We are not using Dialogflow here.

Each prompt/response in a conversation is a disconnected transaction, meaning that there is no persistent connection maintained between a prompt and the response. The response may be cancelled, time out, or just never return. Conversations are identified by a consistent conversationId across all transactions in a specific conversation. It is also possible to attached conversation state information to a prompt which will be returned by the next response.

Transactions between Google Actions and the fulfillment service are done using a webhook with JSON. There is a Actions SDK for Node but it isn't being used because it is a pain to work with. We are working with the JSON received.

Here a Node Red flow represents a conversation, which can have multiple paths and loops if required. Each conversation starts with a Start node and ends with a Tell node, and can have any number of Ask nodes in between. A conversation flow can have multiple paths and loops as required.

So a user initiates a conversation usually by saying 'Hey Google, talk to my test app'. This is received by the Start node and generates a new message in the flow. Typically the msg.payload will be empty,but it may contain a user request if the user says something like 'Hey Google, tell my test app to turn on the kitchen light'

In this diagram, this msg is passed to an Ask node which prompts the user for more information (a selection perhaps). Ask nodes can ask simple questions, provide suggested responses, and offer a list of options. It can also ask for specific types of information like date/time, and location using some of Google Assistant's built-in intelligence. Once the prompt is sent, the msg flow stops until a response is received. When the response is received, it is output from the response output of the Ask node. If the user cancels the conversation or doesn't respond within a timeout period, then a Cancel msg is output from the cancel output of the Ask node.

That response msg would typically pass through some sort of processing function like Ecolet and could trigger an action. In the diagram above the msg passes through Function A and then to another Ask node for a second prompt/response transaction.

The second response msg passes through Function B to a Tell node which returns a completion message to the user to close the conversation.

If a Cancel or Timeout is received, it is passed the the Cleanup function that cleans up any incomplete transaction before closing the conversation with a Tell node.

Multiple conversations could be occurring in a flow simultaneously and each can stall at any Ask node. In fact, conversations could overtake one another. This may need some special considerations but conversationId will help you keep track of what's what.

So, to specifically answer your questions:

  • Ask (and Tell) nodes communicate through the Google Assistant cloud service. Communication to the Google Assistant device is by a different API (Google Cast)

  • Behind the scenes there is a HTTP endpoint that receives the webhook request and passes it to either to the Start node for new conversations, or the Ask node that last sent a prompt in the existing conversation. Ask and Tell nodes use this endpoint to respond to the webhook request with a prompt or completion.

  • The Node Red flow is a fulfillment service and only responds to requests from the Google Assistant cloud service. The user side of the cloud service can receive raw speech requests (pretty useless unless you want your app to literally talk to Google Assistant - trigger routines perhaps?).

  • Yes, there is a request/response transaction between the Google Assistant cloud service and the Node Red flow for each time the user speaks. There is no persistent connection maintained between a prompt and the response in a conversation.

So, the big difference between V1 and V2 is that in V1 the flow was transaction based - every initiation or response triggered an entire flow and a conversation would involve multiple passes through the flow. In V2, the flow is conversation based and can pause within the flow whilst waiting for user a response.


#4

Hey Dean,
GREAT explanantion! Thanks for spending your time...

That is a talent that makes me famous across the globe :wink:
But when I look at your diagram, I have met the real competition now ...

So the Node-RED flow is used instead of DialogFlow?

Have you developed a custom webhook for Node-RED in your contribution (which you register somehow)? Or is that an out-of-the-box feature from Google?

I have no Google Home device, so cannot test yet. But am I correct that you can send a text 'Choose option A or B' into the Ask node, that the user will hear this sentence as a voice signal?

So I assume that this is pretty good secured? I mean someone else cannot send http requests (containing textual commands) to the endpoint to turn off my heating?

Ah I thought that the ecolet functionality was part of the Google cloud solution? So our input is a textual representation of the speech signal, without any preprocessing done by Google? Whatever you tell to the Home device will arrive 100% identical as a string in the output message of the Start node?

Bart


#5

I'll warn you that I did use to be a professional diagrammer

That's right. Firstly, there are some complications to using Dialogflow which make it a bit of effort for not much gain. Secondly, it is more powerful to use Node Red to control the conversation flow as it can use other inputs as conditions. For example, you could set up a 'good morning' conversation that was different depending on the local weather.

No, it is the official Google webhook that uses JSON to pass transaction requests and responses. It's all documented (in a roundabout way) on Google's developer website.

Yes, so Ask nodes are for asking questions and Tell nodes are for telling results or messages. Essentially, Ask nodes say a prompt and wait for a response whereas Tell nodes say a message then close the conversation with the user.

Don't forget, this works with Google Assistant on your smart phone and there is a developer test console with a web interface.

Nope, not in the slightest (though there is a way of checking that a request has come from Google - I should probably implement that).

First of all, Google Assistant apps are generally meant to be open access so that anyone can use them. There is no built in security from Google's side.

There are some restrictions though. First, only devices linked to your Google account can invoke your test app, so it isn't open to everyone. However, anyone using one of your devices can issue commands to your server. And someone else could set up their own test app to point to your server.

Google Smart Home does have security built in, but that is a whole different ball game of external credential validation and authorization.

The ecolet functionality is part of what is provided by Dialogflow. Whatever the user says is translated to text and sent as is to the node. Note: only conversation initiations come out of Start nodes - later responses come out of the Ask node that prompted for the input.

There are some exceptions though with some prompt types. The Date and Time prompt will let the user say something like 'Tuesday next week at 4pm' and return the actually date and time. The Address prompt will allow the user to respond relative to their location. For example, they could say 'McDonalds' and Google will return the street address of the nearest McDonalds to the user's current location. The Name and Location prompt will ask the user for permission to query their phone for their current location and return that along with their registered name.

One other thing - this isn't purely speech based. Using Google Assistant on your phone or Google Hub allows you to type responses or select menu items, as well as speak.


#6

If you are able to do that, that would be a nice addition!

Ok, so I can test your nodes without a Home device ...

P.S. Perhaps you could add a link in your readme file to your explanation above! Love your diagram (without crossing wires) !!!


#7

Dean, I did use v1 for a while, but found having to first announce 'use my test app' a pain, even if I used an alias.
I don't have my flow any longer to test, but now that 'custom routines' have been introduced by Google, I wondered if it would be possible to start interactions with a routine, which prefixed 'use my test app' first, and then followed by the actual command?


#8

Yes you can. I have a routine that shortcuts 'node red' to 'talk to my test app'. You can also give the command in the initiation such as 'Hey Google,tell my test app to turn on the kitchen lights'.

You still get Google rambling on about 'Alright, getting the test version of my test app'


#9

I've just noticed that you can now deploy your Google Assistant app in alpha test mode which does not require it to be reviewed and approved by Google before hand. This is done through the Google Assistant developer console.

Alpha test mode lets you set a name for your app instead of using 'My Test App'. This is not as easy as it sounds as you cannot use trademark names, the name must be two or more words, and it must be unique in the Google Assistant ecosystem. So 'Node Red' is a trademark and most variations of 'home control' are taken.

In alpha test mode,you can invite up to 20 other users to access your app.

There are a few configuration items available including the type of voice, background colours, and icons.


#10

I trying to find were you can update to alpha test I can't find it


#11

npm install node-red-contrib-google-action@alpha

or switch to the no-api branch on Github

In the Google Assistant console,it is under Release