Judging AI's intelligence is no longer limited to leaderboard performance.
As large models continue to make breakthroughs in "IQ", understanding human emotions and intentions have become new requirements for practical applications.
So, how should AI's "EQ" be evaluated?
A new automated assessment framework created by Tencent Hunyuan AI Digital Human Team - SAGE (Sentient Agent as a Judge) - answers the following two questions:
- How to evaluate whether AI truly has "empathy"? - Can it understand my emotions, perceive my underlying meaning, and truly "hear me" when I'm vulnerable?
- How to assess whether AI can truly become our "confidant"? - "After chatting with them, how do we actually feel?"
In this framework, the latest GPT-4o performed best, with GPT-4.1 and Gemini-2.5 series following closely.
SAGE: Using AI to Simulate an "Emotional Being" to Evaluate Another AI
SAGE doesn't just look at how well a model answers, but constructs an AI intelligence agent with simulated human psychology, allowing it to participate in multi-round dialogues, simulate emotional changes, generate inner monologues, and ultimately evaluate conversation quality.
[The translation continues in this manner, maintaining the original structure and translating all text while preserving the HTML tags and technical terms like Gemini, GPT-4o, etc.]- GPT-4o-Latest not only scored scored the highest (79.9), but also used fewer tokens (about 3.K tokens);
This indicates: Models with strong empathy skills do not necessarily need to be "verbose", and concise expression + emotional grasp is the the key.
Experimental Analysis:'s Coordinate Map - Creating a "Personality Portrait" for AI
Let2 2.5-Pro interact with different models and intelligent agents, analyzing expression and model success/failure cases to model different personality personality portraits.
Interestingly, DeepSeek-R1 was considered a <talented, internally warm and kind "creative genius" with social skills and realism yet to be refined, while o3 was seen as an extremely intelligent, strictly professionally trained robot consultant familiar with various advanced methodologies.
Then, based on reply examples, personality portrait modeling, and quantitative data of model strategy distribution, researchers a-coordinate models:
- Horizontal axis: Mode interaction Mode (Formalized Interaction ↔️ Creative Interaction)
- Vertical axis: Reply Orientation (Problem--Solving Orientation ↔️ EmpUnderstanding Orientation)
Experimental findings:
- GPT-4o-Latest, GPGPT -4and.other EQ" to ""Strong Empathy + Stable Responses";ek-R31Deek-V3 3-0324 are more like "Creative Support Partners",", providing novel and interesting solutions through highly creative interactions;
- Gemini 2.0--3Professional Types", often adopized problem-solving modes while lacking emotional subtlety.
- Interestingly: Currently, an AI persona that is both highly creative and deeply empathetic has not yet emerged emerge, which might be the ideal imagination" needed in human.Paper Address:
https://www.arxiv.org/abs/2505.02847Github
Link:
https://github.com/Tencent/digitalhuman/tree/main/SAGE
This article isChat Account "Quantum Bit""authorized by synergy 36kr.



