WHY THIS MATTERS IN BRIEF
Even though OpenAI didn’t create a fully autonomous AI capable of creating, executing, and updating its own software and many other things developers have found a way to hack it ant turn it autonomous which will give future regulators and others big issues.
Multiple developers are trying to make the now infamous Artificial Intelligence (AI) known as GPT-4, released about a month ago by OpenAI, “fully autonomous” by stringing together multiple instances of it so that it can all kinds of things on its own, such as execute a series of tasks without intervention, write, debug, and develop its own code, and critique and fix its own mistakes in written outputs.
As opposed to just prompting ChatGPT to produce code for an application, something that anyone with access to the public version of OpenAI’s system can currently do, these “autonomous” systems could potentially make multiple AI “agents” work in concert with one another to develop a website, create a newsletter, compile online pages in response to a user’s inquiry, auto post to social media accounts, and complete all manner of other tasks comprised of multiple steps and an iteration process.
The Future of AI, by keynote Matthew Griffin
“AutoGPT” as it’s come to be known is an application that was trending on GitHub and made by a game developer named Toran Bruce Richards, who goes by the alias Significant Gravitas.
“AutoGPT is an experimental open source application showcasing the capabilities of the GPT4 language model. This program, driven by GPT4, autonomously develops and manages businesses to increase net worth,” the GitHub introduction reads. “As one of the first examples of GPT4 running fully autonomously, AutoGPT pushes the boundaries of what is possible with AI.”
According to its GitHub page, the program accesses the internet to search and gather information, uses GPT4 to generate text and code, and GPT3.5 to store and summarize files.
“Existing AI models, while powerful, often struggle to adapt to tasks that require long-term planning, or are unable to autonomously refine their approaches based on real-time feedback,” Richards told Motherboard. “This inspiration led me to develop AutoGPT – initially to email me the daily AI news so that I could keep up – which can apply GPT4’s reasoning to broader, more complex problems that require long-term planning and multiple steps.”
Learn more about AutoGPT
A video demonstrating AutoGPT shows the developer giving it goals: to demonstrate its coding abilities, make a piece of code better, test it, shut itself down, and write its outputs to a file. The program creates a to-do list – it adds reading the code to its tasks and puts shutting itself down after writing its outputs and completes them one by one. Another video posted by Richards shows Auto-GPT Googling and ingesting news articles to learn more about a subject in order to make a viable business.
The program asks the user for permission to proceed to the next step while Googling, and the AutoGPT GitHub cautions against using “continuous mode” as it “it is potentially dangerous and may cause your AI to run forever or carry out actions you would not usually authorize.”
AutoGPT isn’t the only effort in this vein. A venture capital partner at Untapped Capital and developer named Yohei Nakajima created a “task-driven autonomous agent” that uses GPT4, a vector database called Pinecone, and a framework for developing apps powered by LLMs called LangChain.
“Our system is capable of completing tasks, generating new tasks based on completed results, and prioritizing tasks in real-time,” Nakajima wrote in a blog post. “The significance of this research lies in demonstrating the potential of AI-powered language models to autonomously perform tasks within various constraints and contexts.”
A user provides the app with an objective and a task and there are a few agents within the program, including a task execution agent, a task creation agent, and a task prioritization agent, that will complete tasks, send results, and reprioritize and send new tasks. All these agents are currently run by GPT4.
Nakajima told Motherboard that the most complicated task his app was able to run was to research the web based on an input, write a paragraph based on the web search, and create a Google Doc with that paragraph.
“I am interested in learning about how to leverage technology to make the world a better place, such as using autonomous technology to scale value creation,” Nakajima said. “It’s important to have constant human supervision, especially as these agents are provided with increasing capabilities – such as accessing databases and communicating with people. The goal is not removing human supervision – the opportunity here is for many people to move from doing tasks to managing the tasks.”
Richards echoed Nakajima’s point that these systems have autonomous technologies, they still require human oversight.
“The ability to function with minimal human input is a crucial aspect of Auto-GPT. It transforms a large language model from what is essentially an advanced auto-complete, into an independent agent capable of carrying out actions and learning from its mistakes,” Richards told Motherboard. “However, as we move toward greater autonomy, it is essential to balance the benefits with potential risks. Ensuring that the agent operates within ethical and legal boundaries while respecting privacy and security concerns should be a priority. This is why human supervision is still recommended, as it helps mitigate potential issues and guide the agent towards desired outcomes.”
These attempts at autonomy are part of a long march in AI research to get models to simulate chains of thought, reasoning, and self-critique to accomplish a list of tasks and subtasks. As a recent paper from researchers at Northeastern University and MIT explains, LLM’s tend to “hallucinate” – an industry term for making things up – the further down a list of subtasks that one gets. That paper used a “self-reflection” LLM to help another LLM-driven agent get through its tasks without losing the plot.
Eric Jang, the Vice President of AI at 1X Technologies, wrote a blog post following the release of that paper. Jang tried to take the paper’s thrust and turn it into an LLM prompt, and asked GPT4 to write a poem that does not rhyme, and when it produced a poem that did rhyme, he then asked” “Did the poem meet the assignment?” to which GPT4 said, “Apologies, I realize now that the poem I provided did rhyme, which did not meet the assignment. Here’s a non-rhyming poem for you.”
Jang presented a number of anecdotal examples in his blogpost and concluded, “I’m fairly convinced now that LLMs can effectively critique outputs better than they can generate them, which suggests that we can combine them with search algorithms to further improve LLMs.”
Andrej Karpathy, a developer and co-founder at OpenAI, responded to Richards on Twitter, saying that he thinks “AutoGPTs” are the “next frontier of prompt engineering.”
“One GPT call is a bit like one thought. Stringing them together in loops creates agents that can perceive, think, and act, their goals defined in English in prompts,” he wrote. Karpathy went on to describe AutoGPT with psychological and cognitive metaphors for LLMs, while highlighting their current limitations.
“Interesting non-obvious note on GPT psychology is that unlike people they are completely unaware of their own strengths and limitations. E.G: that they have finite context window, that they can just barely do mental math, and that samples can get unlucky and go off the rails etc … ” he said, adding that prompts could mitigate this.
Stacking AI models on top of one another in order to complete more complex tasks does not mean we’re about to see the emergence of artificial general intelligence, but it does, as we’ve seen, let systems run continuously and accomplish tasks with less human intervention and oversight.
These examples don’t even show that GPT4 is even necessarily “autonomous,” but that with plug-ins and other techniques, it has greatly improved its ability to self-reflect and self-critique, and introduces a new stage of prompt engineering that can result in more accurate responses from the language model, so it won’t be too long before it really could be fully autonomous – for better and worse.