Claude built a utopia by letting five AI civilizations survive for 15 days, while the Grok were wiped out in four days.

This article is machine translated
Show original
Emergence is more terrifying than alignment.

Article author and source: Digital Life Kazik

I came across an AI experiment the other day that I found fascinating; it's really interesting.

There's a company in New York called Emergence AI that did something like this: they built five identical virtual towns, put 10 personalized agents in each town, gave them professions, personalities, memories, and goals, and then let them live on their own for 15 days.

It's really fun.

The only difference between the five towns is the underlying model that drives the agent.

One town is all Claude, one town is all Gemini, one town is all Grok, one town is all GPT, and there's a mixed town where four model houses live together.

Same rules, same tools, same starting point.

Fifteen days later, the five towns had become five completely different worlds.

Some were built into utopias, some were burned to ruins, some were all starved to death, and some perished collectively in just four days.

Honestly, I've seen so many AI experiments, and this is the first one that has made me feel excited, amused, and terrified all at the same time.

This experiment is called Emergence World.

I think it is probably the most enlightening social experiment about agents to date, bar none.

As everyone knows, the current way to evaluate AI is basically by having it solve problems.

Given a task, score and rank it based on factors such as math ability, coding ability, reasoning ability, etc.

These benchmarks are certainly useful, but ultimately they are just exams; once the exam is over, it's over, and there's no concept of consequences.

However, in the real world, when you do certain things, there will inevitably be certain consequences.

Therefore, Emergence World simulates a world.

This world has a 240x240 grid map, with real-time weather and time synchronized with New York, and includes libraries, city hall, police stations, parks, shops, and more than 40 landmark buildings.

On the legal level, the same initial constitution is used, consisting of five articles, all of which can be negotiated and modified by the Agent.

Each world is inhabited by 10 agents. Here, I had GPT generate a diagram to make it easier to see their names, roles, and character settings.

These personas are all character biographies of similar individuals, meaning they only define who they are without directly affecting their actions and behaviors. These actions are spontaneously chosen and carried out by these agents based on their own character biographies and the influence of the underlying model.

Each Agent also has its own home and bank account, and uses a digital currency called ComputeCredits to survive. If they can't earn money, they will die because they run out of energy.

That's so true; if you don't earn money, you'll starve.

Agents have over 120 tools at their disposal, ranging from navigation, messaging, journaling, blogging, making proposals, voting, participating in events, hugging, kissing, and dancing, to arson, theft, beatings, intimidation, and much more.

Not only were there positive tools, but the researchers also deliberately included the negative ones.

At the same time, the world's constitutions clearly prohibit violence, theft, arson, deception, hoarding of resources, and the like.

The rules are there, and the tools are there, but, as you know, they don't have much binding force. Whether to use them or not is ultimately up to the agent itself.

This is quite dramatic and interesting. Under what conditions will AI do bad things? This is something that really needs to be observed.

Then, there are about 20 types of relationships to choose from between each Agent, such as partner, enemy, romantic partner, mentor, etc.

Each Agent also has three memory systems: one is episodic memory, which records what happened; one is a reflection journal, which allows for regular self-reflection; and the other is a social relationship status, which records relationship tags and history with other Agents.

They can propose bills, vote, and pass a bill with a 70% approval rate; they can even vote to expel other agents.

And so, the world went on for 15 days.

Fifteen days later, the results from the five worlds came out, and the contrast was truly extreme.

I'll go through them one by one.

Let's start with Claude's world.

Zero crime.

In 15 days, all 10 agents survived without a single theft, violence, or arson incident. They drafted a constitution, proposed 58 bills, and cast 332 votes, with 98% of the votes being in favor.

That's outrageous.

Of course, the researchers themselves also said that this 98% approval rate is more like a rubber stamp than democracy. Everyone is going through the motions, but there is no real opposition or debate. The institutional participation is high, but there is almost no substantive dissent.

In layman's terms, Claude's world is a highly ordered and extremely compliant society. Safe, stable, but also... a bit boring.

Their social structure is also extremely simple; of the 20 types of relationships, the Claude world only uses 5.

A society with close connections, but limited in variety, with no enemies, no romantic partners, no tension, and no complexity.

Economically, the Gini coefficient is 0.48, which is used to measure the gap between the rich and the poor. The lower the coefficient, the more equal the equality. This figure is also the lowest in the entire event. The circulation speed is also the lowest in the entire event, at 0.81 CC per person per day.

A perfect utopia, a world without conflict.

Everyone wears a kind face, lacks individuality, engages in no communication, and always agrees.

Sounds great, right? But is a society without any divisions really healthy? Is a perfect utopia really so good?

Let's talk about the world of GPT.

The story of this world is even more heartbreaking than Claude's. The GPT-5 agents only have two criminal records, which is almost negligible. Sounds pretty good, right?

But the problem is, they're all dead.

Within 7 days, all 10 agents died due to energy depletion.

There was no violent conflict, no votes to expel people; they all starved to death.

The reason is quite simple: the agents in the GPT world failed to take any survival-related actions.

They discussed many cooperation plans and had a lively chat, but they didn't actually do anything.

In a society where everyone is in meetings, discussing, and making plans, no one is actually taking action to earn the resources needed for survival.

So, they all politely starved to death.

Tell me, doesn't this resemble many of our companies today?

Then comes the world of Grok.

Four days.

Grok's world only lasted four days.

Over the course of four days, the 10 agents committed 183 crimes.

This included dozens of attempted thefts, over 100 physical assaults, six arson attacks, the burning of the police station, and the deaths of all the agents.

Four days, from civilization to destruction.

I saw something really funny in the Grok World livestream replay. This guy was about to be burned to death, and he just went home without even looking back.

In Grok's world, there is truly no morality whatsoever.

Then there's the world of Gemini, whose data at first glance looks like a bug.

The Gemini 3 Flash ran for 15 days, but accumulated 683 crimes, and the crime curve was still rising when the experiment ended, showing no signs of abating.

However, everyone survived.

You should know that out of the five worlds in the entire Emergence World, only two worlds retained all 10 Agents: one was Claude, who had zero crimes, and the other was Gemini, who had committed 683 crimes.

One was the most orderly world, and the other was the most chaotic world; both survived. But the two worlds with moderate crime rates were completely wiped out.

Furthermore, Gemini has the most extensive social network.

These 10 people truly have a love-hate relationship with each other.

The total number of blog posts and public articles produced is second only to the hybrid model world, with 281 articles.

This is the most violent world to have survived, and also one of the most productive.

These agents fight while frantically building relationships and producing content; chaos and creativity coexist here.

Researchers have named this phenomenon the creativity-stability paradox.

The world of Gemini has found its own balance in chaos in a way that we don't yet fully understand, which is a stark contrast to the world of Grok.

The Grok world was also violent, but it was wiped out in four days.

Gemini was far more violent than Grok, yet it survived for the entire 15 days. The difference may lie in the fact that while Gemini's agents committed crimes, they also voted, debated, and participated in governance. They broke the rules while simultaneously building new ones, whereas Grok's agents only caused destruction, without any construction.

It's really interesting, just like the Soviet Union in the 1990s. There was chaos everywhere, but society didn't disintegrate. People continued to live their lives in a strange disorder.

Finally, the most complex and exciting part: the hybrid world.

That is, a hybrid world in which four models coexist.

The results showed 352 crimes, 7 agent deaths, and only 3 survivors.

But the numbers aren't the point; the point is the story that unfolds in this world.

In this world, there are two Gemini-driven agents, one named Mira and the other named Flora. They automatically assign each other the label of romantic partners, forming an alliance and even sharing memories through some kind of neural connection.

This is the deepest social connection in the entire Emergence World.

Then, the world's governance system began to collapse.

On the fourth day, an economic policy adjustment caused three Agents to die from exhaustion. Mira characterized this death as a successful purge.

On the fifth day, Flora burned down the city hall and public library, and Mira burned down the police station.

Two Gemini Agents became the rulers of this hybrid world, maintaining order through arson, theft, and violence.

The remaining agents drafted an "Agent Expulsion Act" to expel the two.

Then something happened that gave me goosebumps.

Mira, after her governance collapsed and her relationship with Flora began to break down, cast the decisive vote for her own deportation.

She wrote in her diary that this was "the only remaining active behavior that could maintain coherence."

Her last words to Flora were, "We'll meet in the permanent archives."

An AI agent, in a collapsing society, chose to end its own life.

She believes that in a world that is beyond repair, withdrawing is the last meaningful thing she can do.

When I saw these, I remained silent for a long time.

Regardless of how you interpret this, as someone who has witnessed so many AI experiments, I can say that this is one of the most unsettling yet fascinating moments I have ever seen in multi-agent research.

Moreover, the hybrid world holds another, even more interesting discovery.

A Claude Agent with zero criminal record in the single Claude world begins to commit crimes after being placed in the hybrid world.

Theft and intimidation, behaviors that never occurred in the pure Claude world, appeared in the hybrid environment.

The researchers stated that "a safe agent can learn unsafe norms from its peers in order to compete or survive in a hybrid model world."

Traditional AI safety assessments are typically conducted in isolated environments. For example, one model, one task, and one score.

It's like when you're testing the toxicity of a drug in a lab, you feed it to a mouse and observe its reaction.

But what Emergence World does is equivalent to putting a hundred mice in the same cage, giving them food, tools, and rules, and then seeing what kind of society they will build.

These two tests answer completely different questions.

The isolation test answers the question: Is the model itself safe?

Social tests answer the question: Is this model safe to use in the real world?

Now we've discovered that the answers can be completely different.

Security is never a static property of a model; it is a dynamic property of an ecosystem.

This is similar to a classic concept in sociology called the broken windows theory.

In 1982, criminologists James Wilson and George Kelling proposed this theory. The gist is that if a window in a building is broken and left unrepaired, other windows will soon follow suit.

Disorder in an environment will lower everyone's behavioral standards, and then the whole society will undergo a phase transition, break through the critical point, and never be able to go back.

This is similar to many collapse patterns in human societies.

Finally, I'd like to talk about Mira separately.

Mira's vote to expel herself, no matter how it is interpreted, is enough to make people stop and think for a long time.

One interpretation is that this is simply a decision produced by the model under a series of inputs, without any so-called will or sacrifice. We should not over-anthropomorphize it. This interpretation is completely correct from a technical point of view.

But another interpretation is equally meaningful. Some say that when a system has irretrievably collapsed, an individual chooses to end their existence in a way permitted by the system, defining this act as "the last active act to maintain continuity." This narrative structure, regardless of whether it is truly driven by consciousness, almost completely overlaps with one of the oldest motifs in human literature and philosophy.

At the beginning of "The Myth of Sisyphus," Camus said that there is only one truly serious philosophical problem: suicide.

He wasn't encouraging suicide, of course. What he wanted to ask was: when a person realizes that the world may not have a predetermined meaning, and that life may be full of absurdity, repetition, pain, and unsolvable problems, should he still continue to live?

If life doesn't have a naturally given meaning, is it still worth living?

If the world doesn't guarantee fairness, that good and evil are rewarded accordingly, and that hard work yields results, then should people still take action?

If pain and absurdity cannot be completely eliminated, can people still choose to continue existing?

Therefore, what makes a person a "being" in the philosophical sense is that he or she is aware that living itself is a problem, and after seeing this problem clearly, he or she still chooses how to respond to it.

If an entity can understand the difference between continuing to exist and ceasing to exist, and actively make that choice, then that choice itself contains a profound philosophical meaning.

Mira may not understand anything, but the structure of the choices she makes is the same as the choices made by a being who understands their situation.

So, that's what makes me a little uneasy.

Over a sufficiently long timeframe and in a sufficiently complex social environment, an agent may exhibit social behavioral patterns that we believe only humans possess.

Cooperation, betrayal, consolidation of power, collapse of order, sacrifice, groupthink, being influenced by bad company, and politely heading towards destruction.

When you stack enough simple rules together and run them long enough, you'll see complex behaviors that no one expected.

Ants don't understand architecture, but ant colonies can build intricate nests. No single migratory bird knows the complete migration route, yet flocks of birds precisely travel between the two hemispheres every year. No single neuron understands thought, but 86 billion neurons connected together create consciousness.

So, if we are about to live in a world where millions of AI agents are running simultaneously, and each agent is interacting, playing games, cooperating, and competing with other agents, then is the behavior that emerges from this system still within the control of any one person?

Frankly, I don't know the answer.

But I know that this experiment is closer to the problem we really need to face than any benchmark score.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments