Of course, this is a very large change and would need considerable and careful thought but in general, I would agree that it would be much better to have the flow split. It might even improve performance slightly under certain cases - though possibly not so much under Windows or a Pi. Performance would need to be looked at as well.
I do know that the focus at present is on getting v1 over the line so I don't imagine there is any bandwidth available right now but opening the dialogue would be good.
Perhaps someone could even work up a proof of concept if the wider consensus indicated this as a good idea?
There are merits in splitting up the flow file, but there are also limits to where it goes.
One of the reasons we made the storage layer pluggable was so that people could create their own ways of storing flows. So you can of course try a lot of this out for yourself - however that will mean you'd have to do the git management outside of the editor.
When we introduced the Projects feature, I spent a long time trying to work out how it could also be abstracted out so storage plugins could also provide the same level of version control and management. Ultimately it just wasn't feasible to do.
With the specifics of your proposal - I think splitting it down to individual jsonata expressions is way, way too far. You don't put individual JavaScript statements into individual files.
You say it becomes easier to push/publish only parts of a project. I don't see how that is true if you now have a few hundred files with no indication which flow or subflow they belong to.
If (and its still a big if) we were to look at splitting the flow file, it would at most be putting each flow/subflow into its own file.
There is also adding the option to save as YAML rather than JSON as that can help with diffing properties that span multiple lines.
The problem with putting particular node properties into their own file (such as the javascript from the Function node) is we would not want to hardcode knowledge of what properties should be treated that way in the storage layer. For example, if we hardcode the Function node, then why not the Template node, then the UI Template node, then the Python Function node and then... the list goes on and on. So whilst it is easy to say it should be stored in a separate file, in reality there are problems with how that works.
We are facing similar challenges with a larger project, currently with these stats:
~3.2 MB flow file (formatted with pretty-print option for Git)
33 tabs
4644 nodes
18 subflows
No problems with the runtime, everything works perfectly without any performance issues.
However, as described in the previous posts, it is really hard to track the changes made to the flows in Git.
Another problem arises with the flow editor. We have to coordinate the work, so only one person makes changes at a time, because the browser (Chrome) is no longer able to merge the changes within a reasonable time.
Maybe splitting up the flow file into smaller pieces (one-file-per-tab) could prove beneficial in this area, so the editor would only have to merge parts (tabs) that have actually changed... if this information will be provided by the back-end API, of course.
No criticism here, just wanted to share our experience with a larger project.
How things get represented on disk has no bearing on the fact the editor still loads the flows as a single json object. That is unlikely to change.
I know there are improvements to be made with the diff/merge tool. For example being able to automatically merge changes in the background without interrupting the user. We also need to find ways to make the diff algorithm more efficient; but the fact remains that the more nodes you have, the more things it has to compare to find what has changed. If anyone wanted to help improve the diff algorithm it would be a welcome contribution.
Yes, you are right, of course. It was just an idea, that if the runtime knows about different flow parts, this information could also be incorporated into the admin API and made available to the editor. But I can imagine that it would make flow handling a lot more complex on both ends.
Maybe I was going to far suggesting that everything should be splitted in seperate files. But I still think that splitting flows and subflows in files (could be YAML) would help versionning a lot.
Also, the merge tool in Node-RED is not the only option to manage version. Personnally, I never managed to make it work. It just don't in our environment. So we use Tortoise git and it's fine. And if I'd make it work, I'd probably still use Tortoise because it's a complete tool made for a purpose. There is no need to reinvent the wheel. There's tool to manage files in git. But if Node-RED could more frienly for these tools, that'd be great.
I also think it's just... not a good pratice to have such a large file.