Imagine if AI were no longer just used for math calculations or writing articles, but transformed into a strategic advisor for nations - who would become the most powerful strategist? Recently, an experimental game by AI company Every called "AI Diplomacy" has sparked discussion.
In this adaptation of a classic strategy game, seven top large language models (LLMs) embodied European powers, competing for hegemony. Interested readers can watch the live broadcast on the Twitch channel twitch.tv/ai_diplomacy and witness the AIs' "scheming".
Why Do We Need a New AI Assessment Method?
AI technology is developing rapidly, and traditional evaluation standards are clearly inadequate. AI Diplomacy hopes to provide a completely new assessment approach.
They threw LLMs into a complex online strategy game (adapted from the classic board game Diplomacy), letting seven different LLMs each play a European power, with the goal of competing for control of the European continent. This allows us to observe how AI conducts negotiations, develops strategies, and interacts with other AIs in a near-real-world scenario.
The 'Palace Intrigue' of AIs
Each game generates massive amounts of data that can be used to train AI in learning honesty, logical thinking, or empathy. Crucially, the game platform itself will "evolve", with the game's challenge increasing as AI capabilities improve, preventing AIs from easily "mastering" the entire game.
The Every development team conducted a total of 15 game rounds, with each round lasting from 1 to 36 hours, observing many intriguing phenomena. The company's CEO posted on X, describing the personalities of different models:
- DeepSeek performed like an impatient, aggressive "war maniac"
- Claude, traditionally honest, became a "naive sweetie" ruthlessly exploited by other AIs due to not understanding how to lie
- Google's Gemini 2.5 Pro demonstrated quite impressive tactical execution
- Most surprisingly, OpenAI's o3 model not only cleverly planned a secret alliance but also betrayed all allies at a critical moment, ultimately monopolizing the victory, truly a "scheming mastermind"
Facing AI with Ulterior Motives, Are Humans Prepared?
Every's "diplomatic" experiment is not just a competition to test AI game skills, but more like a warning bell about AI's future capabilities. It clearly tells us that AI is learning more complex strategic interactions, including how to negotiate and even deceive. As AI technology develops rapidly, their capabilities will become increasingly powerful and deeply integrated into various aspects of our daily lives, especially in time-sensitive fields like finance and investment, where our interactions with AI will become increasingly frequent.
Therefore, we need to pay more attention to AI's safety, trust issues, and the establishment of moral guidelines. How to develop more effective deception detection methods and ensure AI's development aligns with human values is a major challenge we must collectively face in the future.




