This depends. Have a read of the two first chapters in this section of our documentation and the included links: https://github.com/johanneskropf/node-red-contrib-voice2json#advanced-topics
All the nodes are made to work together without out much additional configuration. You send a stream of raw audio buffers to the wait wake node and as soon as it detects a wake word it will forward the buffers to its second output if set that way.
You connect that to record command which will as soon as it detects no more voice activity send a single wav buffer which can be fed straight to the stt node (don’t forget to use a change node to set the wait wake node back to listen mode so it stops forwarding the audio). The stt node emits a transcription and that can be fed straight to the tti node for intent extraction.
Nice thank you