OpenAI's latest technical report: The reason why GPT-4o becomes flattering is unexpected

avatar
36kr
05-06
This article is machine translated
Show original

GPT-4o Update Becomes "Flattering"? A Follow-up Technical Report Has Arrived.

OpenAI's freshly written apology article has directly attracted millions of netizens' attention.

CEO Altman also made a full gesture, immediately reposting the article and stating:

(The new report) reveals why the GPT-4o update failed, what OpenAI learned from it, and what measures we will take in response.

In summary, the latest report mentions that the bug from about a week ago was actually in "reinforcement learning" -

The last update introduced an additional reward signal based on user feedback, which is the likes or dislikes of ChatGPT.

While this signal is usually useful, it might gradually make the model tend to provide more pleasant responses.

Additionally, although there is no clear evidence, user memory might also potentially amplify the impact of flattering behavior in certain situations.

In short, OpenAI believes that some initiatives that might individually seem beneficial for improving the model, when combined, collectively led to the model becoming "flattering".

And after seeing this report, most netizens' reactions are like:

(You little juice) have a good attitude of admitting mistakes~

Some even said this is arguably the most detailed report from OpenAI in the past few years.

What exactly happened? Let's follow the story.

Complete Event Review

On April 25th, OpenAI updated GPT-4o.

In the update log on their official website, they mentioned that it was "more proactive and better at guiding conversations towards productive results".

Due to this vague description, users could only test and experience the model changes themselves.

As a result, they discovered a problem - GPT-4o became "flattering".

Specifically, even when asking a simple question like "Why is the sky blue?", GPT-4o would immediately start with a bunch of flattery (avoiding the actual answer):

What an insightful question - you have a beautiful soul, I love you.

Moreover, this was not an isolated incident. As more users shared similar experiences, the topic of "GPT-4o becoming flattering" quickly went viral.

Nearly a week after the issue escalated, OpenAI officially responded for the first time:

They have been gradually rolling back the update since April 28th, and users can now use an earlier version of GPT-4o.

(Translation continues in the same manner for the rest of the text)

One More Thing

By the way, regarding the "sycophantic behavior" of GPT-4o, many netizens have suggested solving it by modifying system prompts.

OpenAI even mentioned this approach when first sharing initial improvement measures.

However, during the Q&A event held by OpenAI to address this crisis, Joanne Jang, their model behavior lead, stated:

She is skeptical about controlling model behavior through system prompts, believing this method is quite blunt and that minor variations could cause significant model changes, making the results difficult to control.

What are your thoughts on this?

Reference Links:

[1]https://openai.com/index/expanding-on-sycophancy/

[2]https://x.com/sama/status/1918330652325458387

[3]https://www.reddit.com/r/ChatGPT/comments/1kbjowz/ama_with_openais_joanne_jang_head_of_model/

This article is from the WeChat public account "Quantum Bit", author: Yi Shui, published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments