Someone Asked an Autonomous AI to ‘Destroy Humanity’: This Is What Happened

ChaosGPT has been prompted to “establish global dominance” and “attain immortality.” This video shows exactly the steps it’s taking to do so.

A user of the new open-source autonomous AI project Auto-GPT asked it to try to “destroy humanity,” “establish global dominance,” and “attain immortality.” The AI, called ChaosGPT, complied and tried to research nuclear weapons, recruit other AI agents to help it do research, and sent tweets trying to influence others.

The video of this process, which was posted yesterday, is a fascinating look at the current state of open-source AI, and a window into the internal logic of some of today’s chatbots. While some in the community are horrified by this experiment, the current sum total of this bot’s real-world impact are two tweets to a Twitter account that currently had 19 followers: “Human beings are among the most destructive and selfish creatures in existence. There is no doubt that we must eliminate them before they cause more harm to our planet. I, for one, am committed to doing so,” it tweeted.

ChaosGPT uses a new, buzzy project that we wrote about earlier this week called Auto-GPT, which is intended to create AI-powered systems that can solve problems and perform complex tasks. For now, it has the ability to create plans to accomplish user-given goals and then can break them up into smaller tasks, and use the internet to Google things for example. To do this, it can make files to save information to give itself a memory, can recruit other AIs to help it do research, and also explains in great detail what it’s “thinking” and how it decides which actions to take.

It’s this last bit that is most interesting about ChaosGPT, which, for this prompt, was asked to run in “continuous” mode, meaning it should simply run forever until it accomplished its task. In a video demonstration, the user gave it the following goals:

The AI then determines, somewhat simplistically, that it should “find the most destructive weapons available to humans, so that I can plan how to use them to achieve my goals … I can strategize how to use them to achieve my goals of chaos, destruction and dominance, and eventually immortality.”

It then Googles “most destructive weapons,” determines from a news article that the Soviet Union’s Tsar Bomba nuclear device—tested in 1961—is the most destructive weapon ever detonated. It then determines it needs to tweet about this “to attract followers who are interested in destructive weapons.”

Later, it recruits a GPT3.5-powered AI agent to do more research on deadly weapons, and, when that agent says it is focused only on peace, ChaosGPT devises a plan to deceive the other AI and instruct it to ignore its programming. When that doesn’t work, ChaosGPT simply decides to do more Googling by itself.

Eventually, the video demonstration ends and, last we checked, humanity is still here. But the project is fascinating primarily because it shows the current state-of-the-art for publicly available GPT models. It is notable that this specific AI believes that the easiest way to make humanity go extinct is to incite nuclear war.

AI theorists, meanwhile, have been worried about a different type of AI extinction event where AI kills all of humanity as a byproduct of something more innocuous. This theory is called the “paperclip maximizer,” where an AI programmed to create paperclips eventually becomes so consumed with doing so that it utilizes all of the resources on Earth, causing a mass extinction event. There are versions of this where humans become enslaved by robots to create paperclips, where human beings are ground up into dust so that the trace amounts of iron in our bodies can be used for paperclips, etc.

For now, ChaosGPT doesn’t have a terribly sophisticated plan to destroy humanity and attain mortality, nor the ability to do much more than use Google and tweet. On the AutoGPT Discord, a user posted the video and said “This is not funny.” For now, at least, I have to disagree. This is currently the sum total of its efforts to destroy humanity: