ChatGPT Losing its training data

Where is chatGPT going to get future training data, as people switch to AI for answers?
StackOverFlow monthly questions is falling

This also opens the question is OpenAI stealing StacksOverFlows intellectual property?

A good question. Part of the answer being, of course, that as long as people keep publishing open repositories and documentation, there will still be new data to work with. The other data that hopefully the models take on board is feedback from the chat itself. For example, when you tell the AI it is being stupid and the solution does not work, that is feedback it should take into account. Whether it does or not, I'm not sure.

But certainly, the future of LLM's is going to continue to evolve. One obvious way, and we are already starting to see this, is that the big models will be used privately with new data overlayed on top rather than updating the core model. This produces more specialised models and this is both exciting - since specialist models can be more accurate and less prone to making stuff up - and worrying - because it opens a lot of opportunities for proprietary, expensive models.

The easy answer to that is - no. Since the SO data is freely provided and freely shared. SO's IP lies in their website and processes. not in the information that the public provide.

Of course, IP questions rarely follow common sense and I'm no IP lawyer so who knows! But I think that book publishers have a better case.

There IS one thing that I think we should all note though. AI's are the exclusive club of rich billionaires (well, and the Chinese government). The opportunities for manipulation are immense and scary.

1 Like

I agree with you mostly here, but to taking another/s answers and serve it up(all though rewritten) as yours is a very gray area until litigation. One argument would be it's a program and not a sentient life form. Would it be OK for another search engine to serve up rewritten answer from another search engine?

1 Like

I don't have an answer. But I can make a prediction. Based on my 2nd comment, you can bet that the AI services will win and everyone else will lose.

There are some popular sites that people use for information, twitter, youtube, facebook etc

These have become almost useless in recent months as all of the information there is ai generated tosh.

Far from taking over, if ai doesn't smarten up it's going to kill the Internet.

1 Like

...or itself?

Who's property is it ? the people who provide answers ? Are people stealing 'intellectual property' from SO when they copy/paste from it ? I dont see much difference tbh.

The models have been already trained, now they have a base model and it can use tools, mcp, acp to further gain knowledge. If i download some C++ class somewhere, it can perfectly fine 'reason' its way through it, and their context windows are so large they remember a lot without hallucinating. For coding it will be fine, other things may be a different story.

... currently especially healthcare. Which is particularly problematic when the AI summary is the first/main answer.

... or you

No as that was the purpose the site was set up for. Was it set up so a third party company can scrap all the answers and serve them up as theirs, using a program to disguise the source of the OG data. There is a nuance here and a line that really will need sorting at some point

but you get paid in "points" :slight_smile:

Please explain how Chatgpt pays the OG question reponses from SOF, and what are these points worth.

sorry - I mean that SO "pays" people who answer questions with reputation points - https://stackoverflow.com/help/whats-reputation - so in some sense they could claim to own the answers - but yes agree ChatGPT and the others just then scrape them.

I may have some knowledge in this area. :wink:

In truth, this is a microcosm of the wider world. There are some amazing advances in AI use for clinical settings and these are already beginning to show real benefits both for patients and clinical staff. These are the specialist AI's though that I already mentioned.

Where use is not so good is in general administration or with patients turning to generic AI's producing generic - and sometimes made-up answers. Generic AI's are trained to suck-up to you to try and hook you into continuing to use them. We can clearly see some very dangerous things happening here.

Also not so good is where we already see companies trying to lock up specialist AI tools. Since AI hasn't shown a profit anywhere yet - by a long way - you can expect this to get very much more expensive.

Although I probably don't understand the question here's my take.

(And nothing against people here. It is a general statement.)

I am in a few forums and a lot of times when I ask questions the answers aren't always of benefit.

Asking ChatGPT - though no perfect - I get answers a lot quicker that usually are near what I want.
After a few re askings, it nearly works.

Via forums it could take days.

Alas it worries me that we are getting hooked on using these things now, while they are free.

I'm not sure how good this is for us as people and our ability to interact with others - Using AI (IMO) doesn't help.

I imagined ai bots will continue developing the dead internet theory and feed itself spiraling into the abyss.

1 Like

I love this question! I never gave that any thought, but I'm sure AI learns and will continue to learn from far more than just support forums.

I just started using ChatGPT a month ago and have been down rabbit-holes ever since. One of the things I used it for was turning a dashboard I recently converted from Dashboard 1 to uibuilder (with the help of Julian and others) and it was fully functional. I fed it in to ChatGPT and have used to make my dashboard 100x better.

2 Likes

I do hope you can share at least some of it in the forum so as to inspire others and help them understand that maybe UIBUILDER isn't as hard as people sometimes think. :smiley:

3 Likes

Just come across this article in The Register...

Who invented AI? Humans. Who controls AI? Humans. Who uses AI? Humans. Who believes in AI? Humans.

Therefore that sentence should be "if humans don't smarten ...."

I think blaming complex algorithmic processes for ones own failings is counterproductive.

2 Likes

I'll be happy to, Julian!

My location and ride names are intentionally concealed.

Here's what I started with (before and after I converted to uibuilder):

Here's where I'm at now with the assistance of ChatGPT:

I used ChatGPT to improve what I build to make it look more appealing, improve existing and implement new functions, and learn from the code it generated. One of my "demands" from AI was to heavily document any and all code it generated to help me learn from it. It also found and corrected bugs in my code.

I'm in the process of taking this all a few steps further. What you see here is the only interface to the system. There's a lot that has to be done to add a ride. Each ride has a Raspberry Pi or VM running Node-RED with a flow that reports everything back to the system which outputs this display. Everything is provisioned by Ansible and, when you get right down to it, the overall system (while fully functional and reliable) could be simplified. I decided to have ChatGPT assist me with that and what is has done so far IS drastically simplify this infrastructure.

So far, it's down to a oneline install command that installs the server and all required software to get things running. The part of the project I'm working on right now is the admin interface. Specifically, I'm working on the part that imports status and fault messages.




All of these screens are done with uibuilder with ChatGPT's assistance and I still have way more to do. The end goal is to have a simple installer that anyone can run and then configure the system. The flow will also control and monitor Ansible for remote provisioning of the ride flows.

In the end, it was AI that helped me make SIGNIFICANT improvements to a system I already built and had running. You're going to have people that call it lazy or "not real programming" but the fact of the matter is AI is just another tool we can use to not only speed up development, but also learn from it if we allow ourselves to.

For anyone who gets the wrong idea about programming using AI... It's not a matter of telling it what you want and walking away. You need to learn the idiosyncrasies if doing it this way and guiding it just as much as it guides you, also knowing it WILL get things wrong at times and you'll need to correct it. Think of yourself as the supervisor and AI as an employee.

You can tell it the end result of what you want and it will start asking you questions regarding your preferred language or other technologies. I've told to to ask me more questions to determine what's best and I've also told it I'm set on what I have. Hell, I asked it if there were alternatives to Node-RED that it felt would be better for this particular project. It asked me a few questions. I answered them. It told me Node-RED is my best option and also compared the alternatives, explaining exactly why NR was the best candidate.

A lot of these systems don't have the backend resources to test code and AI's code requires just as much debugging as your own would. It can be frustrating at times. You can end up going a full day troubleshooting a problem with it before the bad code finally gets found. If you build your own in-house AI system, you could absolutely set it up to test and debug its own code but most people aren't going to do that.

Chris

2 Likes