Hi, I am new to Node-RED. I have a question where I needed help, I posted this question on stackoverflow, but here I paste again:
I’m experimenting with generating Node-RED workflows using an LLM. As expected, the unstructured JSON responses are often messy or invalid.
To improve reliability, I’m trying to use OpenAI’s structured_output feature. It works nicely with simple, fixed schemas — for example:
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
response = client.responses.parse(
model="gpt-4o-2024-08-06",
input=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
text_format=CalendarEvent,
)
event = response.output_parsed
However, in my case, the target structure — a Node-RED flow — is much more complex and heterogeneous.
A Node-RED flow (or subflow) can contain many types of nodes, each with its own attributes. The Node-RED admin API docs describe the general structure, e.g.:
{
"id": "1234",
"label": "Sheet1",
"nodes": [ ... ],
"configs": [ ... ],
"subflows": [ ... ]
}
My question is:
How should I define a Pydantic model (or a hierarchy of models) that can represent this flexible workflow structure, so that an LLM’s structured output can conform to it?
I understand that each node type may have its own schema, but I’m not sure how to model this polymorphism in a way that still works well with OpenAI’s structured_output or responses.parse.
Any examples or design patterns for handling this kind of heterogeneous JSON structure would be appreciated.