How can I define a Pydantic schema for heterogeneous Node-RED workflows when using OpenAI structured outputs?

Hi, I am new to Node-RED. I have a question where I needed help, I posted this question on stackoverflow, but here I paste again:

I’m experimenting with generating Node-RED workflows using an LLM. As expected, the unstructured JSON responses are often messy or invalid.

To improve reliability, I’m trying to use OpenAI’s structured_output feature. It works nicely with simple, fixed schemas — for example:

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

response = client.responses.parse(
    model="gpt-4o-2024-08-06",
    input=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    text_format=CalendarEvent,
)

event = response.output_parsed

However, in my case, the target structure — a Node-RED flow — is much more complex and heterogeneous.

A Node-RED flow (or subflow) can contain many types of nodes, each with its own attributes. The Node-RED admin API docs describe the general structure, e.g.:

{
  "id": "1234",
  "label": "Sheet1",
  "nodes": [ ... ],
  "configs": [ ... ],
  "subflows": [ ... ]
}

My question is:
How should I define a Pydantic model (or a hierarchy of models) that can represent this flexible workflow structure, so that an LLM’s structured output can conform to it?

I understand that each node type may have its own schema, but I’m not sure how to model this polymorphism in a way that still works well with OpenAI’s structured_output or responses.parse.

Any examples or design patterns for handling this kind of heterogeneous JSON structure would be appreciated.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.