The Misleading Myth Of AI Agents…
This weekend, I was exploring the concept of building “AI Agents.”
Although I dislike the term, as you’ll soon see why.
The term “AI Agents,” as it is often used now, simply refers to automated software with Generative AI capabilities tacked on.
It’s not as ground-breaking as the name might suggest.
When people talk about “AI Agents,” they are typically referring to software (not physical robots) that can interact with other software, often via web browsers.
Now, on a website via a web browser, users commonly perform actions such as:
1) Clicking — this includes single-clicking, double-clicking, and right-clicking on links, buttons, checkboxes/radio buttons, images, or icons etc;
2) Navigating — scrolling vertically or horizontally, triggering hover effects, dragging and dropping, zooming, etc;
3) Interacting — highlighting, copying, or dragging elements etc;
4) Inputting — typing into text fields, selecting options from dropdowns, checking checkboxes/radio buttons, picking dates, adjusting sliders, uploading files, pasting, etc.
5) Submitting — sending forms, refreshing pages, and moving forward or backward in the browser.
“AI Agents,” whether autonomous or not, aim to replicate these actions.
However, Generative AI itself cannot perform these tasks directly, nor is it designed to.
Autonomous AI Agents theoretically decide on actions like clicking, navigating, interacting, or submitting inputs on websites.
However the technology for reliable, accurate, independent decision-making in such scenarios is still in its infancy.
Current “AI Agents” can only execute predefined automation tasks and are not autonomous.
Generative AI, on the other hand, specialises in creating unique outputs such as text, images, audio, or video.
It can also “highlight or copy” text to analyse it.
But that’s where its role ends.
It does not handle tasks like clicking buttons or navigating pages.
These actions require automation software, not generative capabilities.
In short, Generative AI generates; it doesn’t perform actions.
“AI Agents” combine automation and generative capabilities.
While this represents an incremental improvement, it’s hardly revolutionary as web automation has existed for decades.
What makes Generative AI particularly useful in web automation is its ability to replace traditional “Regular Expressions”, in specific contexts.
“Regular Expressions” (Regex) are a programming tool used to find exact matches for text.
For example, if you needed to locate the word “find” in a document, Regex could identify it precisely but wouldn’t catch synonyms like “discover” or “locate.”
With advancements in Large Language Models (LLMs) and Natural Language Processing (NLP), Generative AI can now perform tasks like synonym recognition and sentiment analysis, which Regex cannot.
This allows for a more nuanced and flexible approach to web automation, making tasks like navigating and interacting with web content easier and more efficient.
In essence, Generative AI is a significant improvement, but not to automation.
Generative AI enables smarter automation and interpretation of context, but it is still part of a broader ecosystem of tools required for comprehensive web automation.
Don’t be fooled by the charlatans out there.
They want your money, not your to understand or succeed.
Proceed with caution, as always.
P.S. Yes, I have an “AI Agent” up and running that can navigate to a contact form, read the site for context, create a personalised message then submit it without a human.
P.P.S. No doubt I’ll do a video walk-through at some point.