ChatGPT Losing its training data

Where is chatGPT going to get future training data, as people switch to AI for answers?
StackOverFlow monthly questions is falling

This also opens the question is OpenAI stealing StacksOverFlows intellectual property?

A good question. Part of the answer being, of course, that as long as people keep publishing open repositories and documentation, there will still be new data to work with. The other data that hopefully the models take on board is feedback from the chat itself. For example, when you tell the AI it is being stupid and the solution does not work, that is feedback it should take into account. Whether it does or not, I'm not sure.

But certainly, the future of LLM's is going to continue to evolve. One obvious way, and we are already starting to see this, is that the big models will be used privately with new data overlayed on top rather than updating the core model. This produces more specialised models and this is both exciting - since specialist models can be more accurate and less prone to making stuff up - and worrying - because it opens a lot of opportunities for proprietary, expensive models.

The easy answer to that is - no. Since the SO data is freely provided and freely shared. SO's IP lies in their website and processes. not in the information that the public provide.

Of course, IP questions rarely follow common sense and I'm no IP lawyer so who knows! But I think that book publishers have a better case.

There IS one thing that I think we should all note though. AI's are the exclusive club of rich billionaires (well, and the Chinese government). The opportunities for manipulation are immense and scary.

I agree with you mostly here, but to taking another/s answers and serve it up(all though rewritten) as yours is a very gray area until litigation. One argument would be it's a program and not a sentient life form. Would it be OK for another search engine to serve up rewritten answer from another search engine?

I don't have an answer. But I can make a prediction. Based on my 2nd comment, you can bet that the AI services will win and everyone else will lose.

There are some popular sites that people use for information, twitter, youtube, facebook etc

These have become almost useless in recent months as all of the information there is ai generated tosh.

Far from taking over, if ai doesn't smarten up it's going to kill the Internet.

...or itself?

Who's property is it ? the people who provide answers ? Are people stealing 'intellectual property' from SO when they copy/paste from it ? I dont see much difference tbh.

The models have been already trained, now they have a base model and it can use tools, mcp, acp to further gain knowledge. If i download some C++ class somewhere, it can perfectly fine 'reason' its way through it, and their context windows are so large they remember a lot without hallucinating. For coding it will be fine, other things may be a different story.

... currently especially healthcare. Which is particularly problematic when the AI summary is the first/main answer.

... or you

No as that was the purpose the site was set up for. Was it set up so a third party company can scrap all the answers and serve them up as theirs, using a program to disguise the source of the OG data. There is a nuance here and a line that really will need sorting at some point

but you get paid in "points" :slight_smile:

Please explain how Chatgpt pays the OG question reponses from SOF, and what are these points worth.

sorry - I mean that SO "pays" people who answer questions with reputation points - https://stackoverflow.com/help/whats-reputation - so in some sense they could claim to own the answers - but yes agree ChatGPT and the others just then scrape them.

I may have some knowledge in this area. :wink:

In truth, this is a microcosm of the wider world. There are some amazing advances in AI use for clinical settings and these are already beginning to show real benefits both for patients and clinical staff. These are the specialist AI's though that I already mentioned.

Where use is not so good is in general administration or with patients turning to generic AI's producing generic - and sometimes made-up answers. Generic AI's are trained to suck-up to you to try and hook you into continuing to use them. We can clearly see some very dangerous things happening here.

Also not so good is where we already see companies trying to lock up specialist AI tools. Since AI hasn't shown a profit anywhere yet - by a long way - you can expect this to get very much more expensive.

Although I probably don't understand the question here's my take.

(And nothing against people here. It is a general statement.)

I am in a few forums and a lot of times when I ask questions the answers aren't always of benefit.

Asking ChatGPT - though no perfect - I get answers a lot quicker that usually are near what I want.
After a few re askings, it nearly works.

Via forums it could take days.

Alas it worries me that we are getting hooked on using these things now, while they are free.

I'm not sure how good this is for us as people and our ability to interact with others - Using AI (IMO) doesn't help.

I imagined ai bots will continue developing the dead internet theory and feed itself spiraling into the abyss.