The widely used ChatGPT chatbot was designed to generate digital text, from poetry to term papers and computer programs. But when a team of artificial intelligence researchers at computer chip company Nvidia got their hands on the chatbot’s underlying technology, they realized it could do much more.
Within weeks, they taught him how to play Minecraft, one of the most popular video games in the world. Within the digital universe of Minecraft, he learned to swim, gather plants, hunt pigs, mine for gold, and build houses.
“You can go into the world of Minecraft and explore on your own, collect materials, and get better and better at all kinds of skills,” said a senior research scientist at Nvidia, Linxi Fan, known as Jim.
The project was an early sign that the world’s leading artificial intelligence researchers are transforming chatbots into a new type of autonomous system called an AI agent. These agents can do more than chat. They can use software applications, websites, and other online tools, including spreadsheets, online calendars, travel sites, and more.
Over time, many researchers say, AI agents could become much more sophisticated and could replace office workers, automating almost any administrative job.
“This is a huge, potentially trillion-dollar business opportunity,” said Jeff Clune, a computer science professor at the University of British Columbia who previously worked on this type of technology as a researcher at OpenAI, the San Francisco startup he built. ChatGPT. “This has enormous advantages and enormous consequences for society.”
Nvidia agent plays a game. Similar agents can schedule meetings, edit files, analyze data, and create multicolored bar charts. The idea is that these automated systems will eventually act as personal assistants capable of handling a wide range of tasks on the Internet.
Today’s agents are limited and can’t exactly organize your life. ChatGPT can search travel site Expedia for flights to New York, but you still have to make the reservation yourself.
This technology, as researchers improve it, could make office workers and consumers more efficient. It could also change the nature of video games, providing a new wave of bots that players can play and chat with.
GPT-4, the technology behind ChatGPT, is what researchers call a large language model. It is an artificial intelligence system that learns skills by analyzing large amounts of data.
In recent months, technology has captivated hundreds of millions of people with the way it generates emails, writes speeches and talks about almost any topic. But your most important skill may be your ability to write computer programs.
You can instantly generate a program that draws a unicorn or drops digital snow on your laptop screen. Professional software developers can request code that they can incorporate into larger programs, including everything from social media applications to search engines. But that’s only part of what this technology can do. It may also generate computer code that accesses other software applications and websites.
This is how Dr. Fan and other Nvidia researchers taught GPT-4 to play Minecraft. “The most important word here is code,” Dr. Fan said. “The code can take actions.”
People use software applications and websites by tapping buttons, menus, and other graphical widgets. AI agents use apps and websites by accessing their application programming interfaces, or APIs, the underlying software code that allows them to communicate with other online services.
If you ask an agent to upload a video to the Internet, for example, it could generate code that calls an API offered by YouTube. “An API is simply text that you use to talk to a machine,” said Silen Naihin, a researcher who helps run an independent AI agent project, AutoGPT.
In theory, a chatbot can write code to access any API on the Internet. But current chatbots are still not skilled enough to do more than simple tasks. And even if they were, allowing them to freely browse the Internet would be a huge security risk. That’s why companies are starting small.
A few months after OpenAI introduced ChatGPT, it quietly released a way for the chatbot to do more than generate text. After installing various add-ons (software that augments what the robot can do), you can ask it to search for available flights on travel sites like Expedia, pull a map of your hometown into Google Earth, or even transform a spreadsheet that details your Annual expenses on a multi-colored bar chart.
Equipped with a plugin called code interpreter, ChatGPT could not only write code but also execute it. This allowed the technology to instantly perform tasks it could not do in the past, including editing spreadsheets and transforming still images into videos. Google, Microsoft and other companies are exploring similar technologies.
“These are projects where we essentially imagine AI working with other AIs on your behalf,” said Ashley Llorens, vice president at Microsoft.
Independent projects like AutoGPT are trying to take this sort of thing a step further. The idea is to give the system goals like “create a company” or “make some money.” You will then look for ways to achieve that goal by asking yourself questions and connecting to other Internet services.
Nowadays, this doesn’t work very well. Systems like AutoGPT tend to get stuck in endless loops. But researchers like Dr. Fan are constantly refining this type of technology in an effort to make it more useful and reliable.
Other researchers are building a new type of AI agent designed to use software tools. In the summer of 2022, Dr. Clune was part of a team of OpenAI researchers that created an agent that could use computer software the same way a person would: mouse click by mouse click, keystroke by pressing.
Dr. Clune and his colleagues fed the system hours of online videos showing people playing Minecraft. By analyzing the way people used the mouse and keyboard to navigate through Minecraft’s digital universe, the system learned to play on its own.
Other companies, including a startup called Adept, are building similar agents who use websites like Wikipedia, Redfin, and Craigslist and popular office apps from companies like Salesforce.
Dr. Clune maintains that this type of agent will eventually allow artificial intelligence to use a much wider range of software applications and websites. He said everyone would have access to a digital assistant that could potentially do almost anything on the Internet. That could make life easier, but it could also replace countless jobs.
“If AI can do anything we can do, it doesn’t just replace boring tasks,” he said. “Replaces all tasks.”