Is AI Learning From Itself?

2 min readJan 8, 2025

The rise of AI tools like ChatGPT has sparked discussions about originality and feedback loops in machine learning.

A common concern is whether these models, by learning from web articles that might themselves have been generated by AI, could create a feedback loop that results in increasingly inaccurate information.

Situations like the ‘copy of a copy’ effect, where repeated replication leads to distortion, think of Michael Keaton’s character in Multiplicity.

However, that’s not how ChatGPT works, and here’s why…

OpenAI trained its large language models (LLMs) like GPT on datasets such as Common Crawl, WebText2, Books1, Books2, Wikipedia, and Reddit before public LLMs were available.

The training process involves determining the statistical probabilities of word combinations from these datasets, a process that was completed before deployment.

Hence the Pre-Trained in Pre-Trained Transformer (GPT).

Once trained, the LLM powers applications like ChatGPT, which is a user interface with added features like instruction-following.

ChatGPT does not actively ‘train’ itself during interactions or by searching the web.

When versions with web-browsing capabilities pull in real-time context, it’s solely to provide relevant answers, not to update the model’s statistical understanding of language.

While it’s possible that AI-generated articles could end up in future training datasets, leading to potential feedback loops, this isn’t an issue for models like ChatGPT today, as its training phase is already complete.

Is AI Learning From Itself?

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Orren Prunckun

No responses yet