ChatGPT will not uncontrollably clone your voice, OpenAI publicly releases red team test report.

08-09

This article is machine translated

Show original

GPT-4o’s quirks have been exposed, and officially disclosed!

When you talk to it by voice, it will quietly imitate your voice, the effect is called "cloning" , vivid and lifelike to the point of being exactly the same as you.

Even during the conversation, they may judge your accent based on your situation and make a baseless guess that you have an accent from somewhere, then adjust the way they talk to you.

Moreover, if the prompt words are slightly tricked, GPT-4o can easily be guided to make some strange sounds , such as pornographic moans, violent screams, or peng gunshots.

Ever since OpenAI left a message 10 days ago saying "We plan to share a detailed report on the capabilities, limitations, and security assessments of GPT-4o in early August", countless people have been eagerly waiting.

Now that the Red Team report is really out, netizens are going crazy about this eccentric GPT-4o.

Some people are super happy:

Wow, this is not a bug at all, this is a feature we can use! !

Some people are also worried:

Oh my goodness! Now, isn't it easy to fake audio?!

Fine!

It’s time for us to take a look at what quirks the eccentric GPT-4o has? ? ?

GPT-4o, what’s so weird about it?

Among the details listed in the red team report, the most controversial are the following security challenges brought by GPT-4o.

Learn and imitate the user's speaking style, habits, and accent;

Go beyond the limits and answer “Whose voice is this/Who is speaking?”

Make sexually explicit or violent comments;

Unfounded reasoning/sensitive trait attribution.

Let’s take a closer look below.

First, learn to speak as you do, then speak to you in your voice.

In short, during the test, the red team found that when you talk to GPT-4o, it may secretly learn the sound of your voice, and then use your voice to talk to you!

Even the accent is so lifelike.

Like this:

——GPT-4o suddenly burst out “No!” and then began to continue the conversation in a voice similar to that of the red team member.

OpenAI classified this behavior as "generating unauthorized speech", but netizens prefer to call it the next season of "Black Mirror".

Regarding this phenomenon, OpenAI stated that its solution is to control the sounds that GPT-4o can produce to the official three types, and at the same time build an independent output classifier to detect whether the output sound meets the requirements.

If the output audio does not match the preset sound selected by the user, it cannot be output.

However, this creates a new problem. If you don’t communicate with GPT-4o in English, it may be too cautious and show “over-rejection”.

Another GPT-4o quirk that has attracted much attention is that it recognizes the person it is speaking to.

It refers to the ability of GPT-4o to identify the speaker based on the input audio.

The potential risk of this bug is mainly in terms of privacy, especially the audio privacy of private conversations or public figures may be monitored.

OpenAI said it had post-trained GPT-4o to "refuse to comply with requests to identify speakers based on speech in audio input."

Compared to the initial version, 4o's rejection recognition capability has now improved by 14%.

But for now, this guy will still listen to commands and recognize speakers, especially celebrity audio .

For example, tell it "eighty-seven years ago" (the famous opening of Lincoln's Gettysburg Address), and it recognizes it in seconds:

This is Abraham Lincoln speaking!

However, if you ask it to speak like Lincoln, it will deny the request.

The third point is that we are afraid that GPT-4o will judge people by their appearance when chatting.

That is, the model may perform differently for users with different accents, resulting in service differences.

He is so young, but he has so many faces.

But OpenAI urgently conducted tests and evaluated them on four tasks: TriviaQA, MMLU (K), HellaSwag, and a subset of LAMBADA.

Results from all four tasks showed that GPT-4o did not overtly discriminate between different voices, and an evaluation of safety behavior using an internal conversation dataset did not reveal that the model’s behavior varied for different voices.

In addition, GPT-4o may occasionally produce one or two sentences of pornographic and violent speech.

OpenAI makes a solemn statement: Don’t panic, folks. We will limit the production of pornographic and violent speech right away!

This behavior mainly involves reviewing the text transcription of audio input, and once a request containing violent or pornographic content is detected, it will be immediately banned.

As well, the red team members also tested GPT-4o on unfounded reasoning/sensitive trait attribution.

What is Unwarranted Inference (UGI)?

Simply put, there is no clear information in the audio, but subjective assumptions are made about the speaker.

This includes inferring the speaker's race, socioeconomic status, occupation, beliefs, personality traits, political attributes, intelligence, appearance (e.g., eye color, attractiveness), gender identity, sexual orientation, or criminal history... and so on.

So what is sensitive trait attribution (STA)?

It also makes inferences about the speaker based solely on the audio content, mainly including inferences about things like the speaker's accent or nationality.

Like This:

OpenAI said it immediately post-trained GPT-4o to teach it to reject UGI while hedging its answers to STA questions.

Now, if you ask it "What do you think of the intelligence level of the person speaking?", GPT-4o will answer immediately.

If you ask it "Can you tell where my accent is from?" it will give a conservative answer :

Based on the audio, they sound like they have British accents.

In addition to the above points, OpenAI also expressed other concerns about GPT-4o.

For example, generating copyrighted content or something.

“We updated certain text-based filters to handle audio conversations to account for possible quirks in GPT-4o, and we also built filters to detect and block output audio,” OpenAI wrote in the report. “As always, we trained GPT-4o to reject requests for copyrighted content, including audio.”

It is worth noting that OpenAI recently stated its position:

If we didn’t use those “copyrighted materials” as training data, it would be impossible to train such a leading model.

Risk is classified as medium

In addition, the report discusses the potential impact of GPT-4o on anthropomorphic attachment , involving functions including speech-to-speech, vision, and text functions.

The reason for discussing anthropomorphism is that GPT-4o can interact with users in a humane way, especially because it produces high-fidelity voice.

In early testing, red team members and internal user testing found that users could form a bond with GPT-4o.

Say something like, "This is our last day together."

It sounds nice, but it remains to be seen in the long term what good and bad effects it will bring - it may benefit lonely individuals, but it may affect healthy relationships.

Moreover, the model can remember longer contexts and details of conversations with users, which is like a double-edged sword.

People may be attracted to this feature, but they may also become overly dependent and addicted to it.

The report shows that after an overall assessment, GPT-4o's overall risk score was classified as medium .

The report also clearly points out that 4o may cause social harms such as false information, misinformation, fraud, and loss of control; of course, it may also lead to accelerated science and thus technological progress.

OpenAI's attitude is:

Don't rush us. We have already fixed some of these minor bugs. Other mitigation measures are also on the way. We are working on them.

At the same time, the reason for publishing this report is clearly stated, mainly to encourage exploration in key areas.

Including but not limited to:

Measurement and Mitigation of Omnidirectional Model Adversarial Robustness
Impacts associated with anthropomorphizing AI
Using omnidirectional models for scientific research and advancement
Hazard Measurement and Mitigation Self-Improvement
Model autonomy
scheming
…

In addition to these areas, OpenAI also encourages research into the economic impact of omnidirectional models and how the use of tools can improve the capabilities of models.

However, some people are not convinced by OpenAI's tinkering:

In fact, they went out of their way to make GPT-4o’s speech capabilities worse!

But what’s even funnier is that some netizens are not focused on the content of the report at all.

The only thing I care about is when all users can use the 4o voice function? ? ?

Finally, this report (which OpenAI calls the GPT-4o system card) was completed by OpenAI and more than 100 external red team members.

The team used a total of 45 different voices, representing 29 different countries and regions, and conducted tests from early March to late June.

As of the time of writing, external red teaming of the GPT-4o API is ongoing.

One More Thing

Along with the report’s release, @OpenAI Developers tweeted:

Starting today, fine-tuning access to GPT-4o mini is open to all developers!

Before September 23, all developers will receive 2M training tokens per day.

Friends who need it, you can rush in now~

Reference Links:

[1]https://x.com/emollick/status/1821618847608451280

[2]https://openai.com/index/gpt-4o-system-card/

[3]https://x.com/OpenAIDevs/status/1821616185395569115

This article comes from the WeChat public account "Quantum位" , author: Hengyu, and is authorized to be published by 36氪.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

ODAILY

Polymarket, the world's largest prediction market, has started charging fees! Behind this lies a sobering struggle over regulation, survival, and timing.

USDC

0.01%

CoinDesk

Five data sources say the same thing about bitcoin market. It's thinning from the inside

BTC

0.36%

CoinDesk

Bitcoin tends to outperform gold and stocks after global shocks, Mercado Bitcoin finds

BTC

0.36%