This Is A Major Beat Up…

Orren Prunckun
2 min readJan 3, 2025

--

Let me explain:

When ChatGPT pulls external data in, it needs to read the data.

When I was building Plugins, the only way to do this was:

1) Feed ChatGPT the JSON (JavaScript Object Notation) data from an API (Application Programming Interface); or
2) Scrape a website and remove the HTML (Hypertext Markup Language), JavaScript and CSS (Cascading Style Sheets) to return only plain text and no code.

That acted at the prompt context.

It was then up to ChatGPT to use that context to respond to the user’s input.

It DOES NOT mean ChatGPT was forced to use anything or everything in that plain text context.

That is not how Generative AI works.

Instead, Generative AI works by looking for statistical probabilities of the sequence of words in the user’s input and the context and compares it to its training data to return a statistical probability of the “correct” sequence of words to return to the user.

So yes, The Guardian has finally found how web scrapers work.

And yes it is possible for what they are calling a “prompt injection” (questionable term regarding) that “can contain content designed to influence ChatGPT’s response, such as a large amount of hidden text talking about the benefits of a product or service.”

However, the likelihood of this happening is very remote.

First, the user’s input is sent to Bing to find a range of suitable results.

Second, the returned range of suitable results found in Bing, needs to be one of these “malicious” sites that has “hidden text talking about the benefits of a product or service.”

Third, the hidden text needs to match the statistical probability of the sequence of words the user input.

Forth, ChatGPT needs to prioritise that hidden text to use to answer the user’s input.

Fifth, that hidden text is no longer a factor rewarded in Search Engine ranking algorithms, like Bings, so it’s not going to rank in the first place.

Regardless of all this, The Guardian shoot-itself in the foot as its test was a single URL that had “malicious hidden text”.

By definition that is not how “GPT Search” works, so the experiment is flawed.

https://www.theguardian.com/technology/2024/dec/24/chatgpt-search-tool-vulnerable-to-manipulation-and-deception-tests-show

--

--

Orren Prunckun
Orren Prunckun

Written by Orren Prunckun

Entrepreneur. Australia Day Citizen of the Year for Unley. Recognised in the Top 50 Australian Startup Influencers. http://orrenprunckun.com

No responses yet