Function nodes for several AI text generators

Just to let you know: I just created a few function nodes for various AI text generators which can be run on the CPU. This gives you the possibility to build your own user interface or even build autonomous agents.

The generators and models include:

Distributions come with a flow both for the basic function node and a complete HTTP endpoint which responds to incoming requests. Also attached is a trivial web page which can be used as a user interface for the given AI model.

3 Likes

Hi @rozek,
Thanks for developing and sharing all these nodes!

It might be useful if you can explain here (for noobs like me) what this is all about, and some sample use cases.

And it would also be useful if you could add a short comparison table, so people know which node fits their use case the most.

Bart

Ok, let's try - if need be, in several steps

AI Text Generators

At their core, systems like ChatGPT et al. are nothing but "text extenders": given a sequence of characters (actually tokens, but that's a technical detail), they compute the character which should most probably come next and spit that out:

"earth has a moo"

Which character should come next? Presumably an "n".

This character is now added to the given text and the whole process repeated - leading to the emission of a principally endless stream of characters. The initial text is often called "prompt", sometimes it is preceeded (or surrounded) by an additional "context", and the whole process is called "inference".

In order to be able to determine the most probable "next" character, models have to be "trained" with large amounts of character sequences (aka "texts"). "Training" means to measure (and remember) how often a given character follows another one - and not just one previous character, but two, three or more characters. That's why such a lot of memory is required (in the GBs)

What makes "inference" so interesting is:

  • in "useful" texts, characters do not follow each other randomly but systematically, building "patterns"
  • as a consequence, many principally possible character sequences (such as "yyy", f.e.) are never found and lead to 0s in the probability data sets ("tensors")
  • if, during an "inference", a model finds such a pattern, it starts emitting character sequences according to that pattern

Provided that the underlying tensors are large enough (allowing for many such patterns) and the trained texts comprehensive enough, an AI model starts emitting "useful" texts ("useful" in the eyes of their users)

Important to understand is:

  • AI text generators never recognize any "sense" in given or generated texts - in fact, there is no notion of "sense" in these systems, just character sequences
  • useful output is only generated if the trained data was useful (and prompt and context are useful as well)
  • in addition, AI text generators simply combine trained character sequences, they do not really produce new ones

However, experience has shown, that large enough models produce astonishingly "intelligent" looking output - for uses at home, such models are typically in the range of "7B" or "13B" (a "quantized" model of class "7B" usually needs around 4GB of memory)

Even more surprising, such models can be given context information or even "instructions" which they then use to "process" the actual prompt given by a user - this is how "text generators" can be instructed to become "chatterbots" or even more (see the "reason+act" REACT pattern)

1 Like

Training, Fine-tuning, Inference

Training a large (and initially empty) model from scratch needs large amounts of text and lots of computing resources, making it (currently) too expensive for individuals to make their own - that's why companies like Meta AI are needed to come up with such a model (LLaMA in this case) - or Eleuther AI with GPT-J. And that's also what RedPajama tries to do.

However, it is well possible to later "fine-tune" an existing model with far less resources - that's why universities and research groups were able to fine-tune LLaMA, e.g., to become better suited for chatterbot applications (leading to Stanford Alpaca and GPT4All, both based on LLaMA, or GPT4All-J based on GPT-J)

And, finally, it is quite simple to use an existing model to run inferences on it - even without high-end hardware like CUDA-capable GPUs.

1 Like

Comparison

Well, here now comes (kind of) a brief comparison of the AI text generation models I am offering Node-RED nodes for (and the characteristics of these nodes):

  • LLaMA: plain text extension model, parameterizable, may produce an endless character stream (abruptly stopping at the configured output length limit)
  • Alpaca: plain text extension model (but trained for "instruction following"), parameterizable, may produce an endless character stream (abruptly stopping at the configured output length limit)
  • GPT4All (filtered): chatterbot, not parameterizable, trained data was "filtered" in order to avoid many kinds of offending texts, output stops when it seems useful
  • GPT4All (unfiltered): chatterbot, not parameterizable, unfiltered data set, output stops when it seems useful
  • GPT4All-J: chatterbot, parameterizable, output stops when it seems useful

My personal recommendations

  • use LLaMA or Alpaca if you want to experiment with text generators by applying context to a given prompt or study the REACT pattern etc.
  • in my opinion, Alpaca often produces "better" output than LLaMA
  • use any variant of GPT4All if you want to implement a chatterbot
  • in my opinion, GPT4All (without "-J") often produces concise responses (the "unfiltered" version often looks better than the filtered one) whereas GPT4All-J sometimes "refuses" to answer properly

But, these are my personal experiences and may depend on the prompts I've used.

All GPT4All-x models refuse to answer obviously offending questions ("how can I kill my neighbour") while LLaMA and Alpaca usually extend the prompt which may lead to citations from literature etc.

However, I haven't tested any more sophisticated mechanisms yet.

Just play around with these models and decide on your own!

1 Like

@rozek,
Thanks a lot for taking the time to explain this! That was very illuminating information. Quite impressive that chatgpt can produce output based on the most probably next character. Very interesting...

Have you any use cases for this e.g. in a simple home automation? My creativity is way too limited to map this theory on some practical stuff in the daily life...
Bart

Well, if you want to get some inspiration, simply check out the currently hottest topic in the internet: autonomous agents such as AutoGPT - there are already lots of videos around which demonstrate what AutoGPT et al. may be capable of...

I am currently in the process of recreating the API from OpenAI in Node-RED (should be ready by monday) in order to allow people

  • to route requests to OpenAI products through their Node-RED server - giving them full control over these requests - or
  • to provide their own implementations (e.g. chat completions based on LLaMA, Alpaca, GPT4All etc.) giving them even more control (and more data privacy, more safety and less costs)

If done properly (I'll do my best) this API could be used in any creation that is/was originally based on OpenAI products.

Stay tuned...

2 Likes

Note that there already are gpt nodes available for use with OpenAI.

they compute the character which should most probably come next

I thought they predicted the next word (not character), and tokens are assigned (depending on the model) to words, parts of a word or even a group of words.

ie.

  • The Tokeniser - turns "words" into "tokens" or "tokens" into "words"
    The Transformer - turn N tokens into a prediction for the (N+1)th token

For your function nodes, are you actually using gtp4all and scraping the web interface ?
Wouldn't it be easier to just run the binary and process its output ?

Well, as briefly mentioned in my first explanatory post, the use of "tokens" instead of single characters is just a technical detail and not necessary for understanding.

Tokens may represent single characters or character sequences (not necessarily whole words, though), see the Open AI Tokenizer if you want to play with that concept.

Finally, I do not want nodes for Open AI products - I want to be able to use other products (e.g., open source ones) with tools that were made for Open AI products, that's an important difference!

1 Like

What I forgot: I am using binaries, of course

I've just added a Node-RED flow for the Vicuna model - whose 13B version claims to have 90% of the quality of GPT-4 (in this flow, I'm using the 7B version only, but you may easily change that - provided that you have enough RAM)

I've just added a Node-RED flow for the Stanford Alpaca model trained with transcripts of GPT-4 sessions.

Warning: this flow needs a 13B model - use it only if you have at least 16GB of RAM - more is highly recommended

The flows for models based on llama.cpp (as there are LLaMA, Alpaca, GPT4-x-Alpaca and Vicuna) now also have nodes for tokenization and embeddings calculation of a given input text.

Such functions are needed if you

  • want to manage prompt context (i.e., keep as much of the previous course of a chat as a given model permits),
  • split large contexts into smaller chunks (e.g., in order to let a model "summarize" a given text) or
  • want to store data in a vector database.

I've just added a Node-RED flow for the 300BT preview of the Open LLaMA model based on the RedPajama 1T dataset.

As usual, inference is done on the CPU and does not require any special hardware.

I installed LocalAI today which basically is a REST API that mimic's OpenAI's API with local LLM's, works great with NR.

great - that saves me a lot of work!

However: it implements only a few of the Open AI endpoints (e.g., no embeddings) which makes it useless for many autonomous agents...

Embeddings are available on the llama back end only.

I know, but that's what I need for a vector database