Can AI also be an emotional master? Tencent released the latest AI social intelligence list, and the latest version of GPT-4o took the first place

avatar
36kr
05-22
This article is machine translated
Show original
Here's the English translation:

Judging AI's intelligence is no longer limited to leaderboard performance.

As large models continue to make breakthroughs in "IQ", understanding human emotions and intentions have become new requirements for practical applications.

So, how should AI's "EQ" be evaluated?

A new automated assessment framework created by Tencent Hunyuan AI Digital Human Team - SAGE (Sentient Agent as a Judge) - answers the following two questions:

  • How to evaluate whether AI truly has "empathy"? - Can it understand my emotions, perceive my underlying meaning, and truly "hear me" when I'm vulnerable?
  • How to assess whether AI can truly become our "confidant"? - "After chatting with them, how do we actually feel?"

In this framework, the latest GPT-4o performed best, with GPT-4.1 and Gemini-2.5 series following closely.

SAGE: Using AI to Simulate an "Emotional Being" to Evaluate Another AI

SAGE doesn't just look at how well a model answers, but constructs an AI intelligence agent with simulated human psychology, allowing it to participate in multi-round dialogues, simulate emotional changes, generate inner monologues, and ultimately evaluate conversation quality.

[The translation continues in this manner, maintaining the original structure and translating all text while preserving the HTML tags and technical terms like Gemini, GPT-4o, etc.]

  • GPT-4o-Latest not only scored scored the highest (79.9), but also used fewer tokens (about 3.K tokens);

This indicates: Models with strong empathy skills do not necessarily need to be "verbose", and concise expression + emotional grasp is the the key.

Experimental Analysis:'s Coordinate Map - Creating a "Personality Portrait" for AI

Let2 2.5-Pro interact with different models and intelligent agents, analyzing expression and model success/failure cases to model different personality personality portraits.

Interestingly, DeepSeek-R1 was considered a <talented, internally warm and kind "creative genius" with social skills and realism yet to be refined, while o3 was seen as an extremely intelligent, strictly professionally trained robot consultant familiar with various advanced methodologies.

Then, based on reply examples, personality portrait modeling, and quantitative data of model strategy distribution, researchers a-coordinate models:

  • Horizontal axis: Mode interaction Mode (Formalized Interaction ↔️ Creative Interaction)
  • Vertical axis: Reply Orientation (Problem--Solving Orientation ↔️ EmpUnderstanding Orientation)

Experimental findings:

  • GPT-4o-Latest, GPGPT -4and.other EQ" to ""Strong Empathy + Stable Responses";ek-R31Deek-V3 3-0324 are more like "Creative Support Partners",", providing novel and interesting solutions through highly creative interactions;
  • Gemini 2.0--3Professional Types", often adopized problem-solving modes while lacking emotional subtlety.
  • Interestingly: Currently, an AI persona that is both highly creative and deeply empathetic has not yet emerged emerge, which might be the ideal imagination" needed in human.Paper Address:

    https://www.arxiv.org/abs/2505.02847Github

    Link:

    https://github.com/Tencent/digitalhuman/tree/main/SAGE

    This article isChat Account "Quantum Bit""authorized by synergy 36kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments