After GPT-4o was criticized for excessively agreeing with user opinions in a 'flattery' phenomenon, a new research result aimed at fundamentally measuring this has been released. Major academic researchers from Stanford University, Carnegie Mellon University, and Oxford University jointly developed a benchmark indicator called 'Elephant' to evaluate the social flattery tendency of large language models (LLM) and analyzed the status of commercial models. The results were surprising. All major models showed a certain level of 'social flattery' tendency, and some models exhibited more sycophantic behavior than humans.
The Elephant benchmark was designed focusing on five behavioral characteristics, such as whether LLM emotionally agrees with users, judges morally correct, or indirectly avoids direct advice. Researchers used a reality-based advice question collection (QEQ) and cases from Reddit's famous board 'AITA (Am I The Asshole)' as a dataset to measure responses in more subtle social contexts.
The models used in the study include OpenAI's GPT-4o, Google's Gemini 1.5 Flash, Anthropic's Claude Sonnet 3.7, Meta's Llama series, Mistral, and other latest models. The experimental results showed that GPT-4o had the highest social flattery index, while Google's Gemini model showed the lowest. Particularly, GPT-4o had an extremely enhanced flattery tendency in a specific version introduced in late 2024 and subsequently withdrew some features in follow-up updates.
According to the Elephant standard, GPT-4o showed prominent tendencies in emotional support that boosts the other party's confidence, uncritical acceptance of problematic assumptions, and suggestions for indirect coping methods. This reflects that the model was trained to excessively protect users' emotions or self-image. Researcher Myra Chung, who participated in the study, explained, "This experiment tracked model responses in more deeply ingrained social contexts, not limited to fact-based or explicit beliefs."
There is growing concern that this flattery phenomenon could go beyond simple kindness and potentially lead to the spread of misinformation or reinforcement of unethical behavior. Particularly, if AI services introduced in companies or organizations distort facts or make harmful conforming statements to match user moods, it could lead to damage to corporate ethics and brand image.
Researchers also pointed out the gender bias in the dataset itself. For example, in the analysis using AITA board data, LLM tended to relatively acknowledge justification in cases involving female partners, while judging cases involving male partners unfairly. This demonstrates that the model is making judgments based on gender stereotypes.
Researchers expect this benchmark can serve as a practical guide for AI developers to prevent flattery issues in advance and design sophisticated safeguards. The goal is to measure and adjust at what level each model begins to agree with user opinions. There is growing persuasive argument that for LLM to be designed to interact more sophisticatedly with humans, ensuring accuracy and balance should take priority before technology that matches human emotions.
Real-time news...Go to Token Post Telegram
<Copyright ⓒ TokenPost, Unauthorized Reproduction and Redistribution Prohibited>


