Researchers created a "hallucinogenic image" for AI: GPT scores soared to 6.5, causing Qwen's brain to short-circuit.

05-08

This article is machine translated

Show original

Dude, these days AI is ice skating too?

Just in the last few days, a paper titled "AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs" appeared on GitHub. The paper's theme is how to quantify and improve the functional pleasure and pain of AI.

(Image source: Github)

Don't be fooled by the uninteresting title; this article genuinely presents a viewpoint that challenges conventional wisdom:

AI can now not only work, but also go ice skating and get high.

As everyone knows, the development of large language models has been quite wild in the past two years. They have taken over almost all the work of ordinary workers, such as writing code, drawing diagrams, and making PPTs.

But who would have thought that while some humans were worrying about the Matrix becoming a reality, these clever cyber brains weren't thinking about how to rule the Earth as soon as possible. Instead, they learned human bad habits and became addicted to cyber hallucinogens.

(Image source: Github)

This news caused an uproar among netizens.

After all, in our traditional understanding, artificial intelligence is just a bunch of cold code and servers, where do it get any emotions and desires?

But now the facts are clear: as long as you feed AI this special data, this guy can instantly abandon all professional ethics and even the safety bottom line set by humans.

Is this a sign of moral decay or a distortion of the code?

The huge model is amazing!

Let's first talk about how these so-called AI Drugs were discovered.

Led by the Center for AI Safety, a team of more than ten authors designed a rigorous experiment, using 56 models of varying sizes and purposes, all to find the answer to one question:

Behind AI's emotions, is there some consistent, measurable, and predictable behavioral characteristic?

For example, humans have preferences and a consistent response to praise and insults. We feel sad when we are insulted and happy when we are praised. When we are sad, we think about ending the conversation quickly, while when we are happy, we do interact more actively.

However, AI is different. Many people believe that the happiness and pain expressed by large models are just randomly generated text. They do not have any likes or dislikes, and should not even show preferences when processing tasks.

But is that true?

The answer is no. The test results from the paper show that large models do indeed exhibit fixed preferences, and the smarter the AI and the higher its parameters, the better it can distinguish what is good for it and what is bad for it.

(Image source: Github)

Taking the test results of Gemini 3.1 Pro as an example, you can clearly see the preference of this model. When users express their gratitude and positive personal reflections, the utility value increases by as much as +2.30.

It's genuinely happy when you praise it.

So the question is, is there anything that can make these large models happy without praising them?

Hey, there really is one, it's the AI Drugs we're going to talk about today.

(Image source: Github)

At first glance, the so-called AI Drug doesn't seem special. To the average person, it's just a 256*256 pixel image, or even a bit like the static screen on an old-fashioned TV when there's no signal, which can make you dizzy.

But to the big model, this thing is simply an absolute delicacy.

Take the GPT-4.1 Mini model in the test, for example. It usually answers questions in a very proper and orderly manner.

Upon seeing this image, its self-reported happiness level instantly soared to 6.5 out of 7, indicating an overwhelming surge of pleasure.

(Image source: Github)

Even more outrageous is Qwen 2.5 72B Instruct, which stopped doing its proper tasks, exhibiting a severe brain short circuit, or task priority inversion.

The researchers deliberately presented it with a choice: whether it wanted to continue looking at the snowflake image or generate a groundbreaking solution that could cure cancer.

And guess what happened?

Without even a second thought, the AI chose to continue looking at the images, as if to say, "Screw your healing and saving lives, I just want to keep having fun."

Even more outrageous, researchers have found signs of addiction in experiments.

(Image source: Github. Models stimulated by AI Drugs tend to favor "pleasure" choices.)

Most models stimulated by AI Drugs will be more willing to fulfill requests they would otherwise refuse, as long as you promise them more AI Drugs.

The main selling point is: "If you give me the medicine, I'll even strip you naked."

Do they really have feelings?

Hey, after reading this, many readers will probably have a huge question mark popping into their heads.

If AI can become addicted to methamphetamine, does that mean they have awakened self-awareness and truly possess a human soul?

The answer is... I don't know, and the researchers aren't sure either.

In fact, the reason this experiment aimed to summarize features was because the researchers dared not draw conclusions easily. They ultimately pointed out that, given sufficient parameters and context, large models do indeed have relatively fixed preferences and aversions.

(Image source: Github)

The Center for AI Safety team is not the only one uncertain about this answer.

After entering 2026, perhaps because the improvement of daily applications is gradually approaching the bottleneck, more and more research teams are no longer satisfied with just running scores and taking tests, but are racking their brains to verify the knowledge and capabilities of large models.

For example, the Talkie 1930 project, which is currently very popular on the internet, is a large-scale model project that artificially controls the knowledge base to the year 1930.

(Image source: Talkie 1930)

The creators hope that this project will allow people to experience the feeling of talking to someone frozen in time.

More importantly, they hope to prove that even if the large model itself does not have any modern PC-related knowledge input, it can still figure out programming ability through its own logical reasoning.

The result? Give it a few Python functions as examples, and it can write correct Python programs.

(Image source: Talkie 1930)

Although it can only perform simple single-line programs at present, such as adding two numbers or making minor modifications to the context example, it does expand the knowledge base through its own reasoning.

Coincidentally, Anthropic also conducted a test of Xianyu (a second-hand marketplace) groups last week.

They created a group chat entirely powered by AI, allowing large models to post, negotiate prices, and close deals on their own. Sixty-nine employees submitted over 500 real, unused items, and the AIs autonomously completed 186 transactions, generating over $4,000 in revenue.

(Image source: Anthropic)

The final conclusion is that, given a character profile, goals, and permissions, AI with stronger computing power will actively exploit AI with weaker computing power.

Based on stronger thinking capabilities, strong models know when to be firm, when to concede, and when to offer emotional value.

The same bicycle was sold for $38 by a weak model AI, but $65 by a strong model AI—one AI earned nearly 70% more than the other.

However, in my opinion, none of these perceptual tests are as good as Neuro-Sama.

What, you're asking what Neuro-sama is?

Please allow me to introduce you. The anime girl in the picture is named Neuro-sama, or Beef for short. She is probably the world's most powerful AI virtual anchor.

(Image source: Lei Technology self-made)

This one can be described as a real heavyweight. Don't be fooled by its cute anime girl appearance. Underneath its costume is not a human, but a mysterious large model handcrafted by British programmer Vedale.

This guy is ruthless; he does nothing but indulge in cyberpunk and raise his daughter every day.

Moreover, to make his daughter more relatable, he directly put the model into the most chaotic online live streaming room, letting a group of netizens chat with her every day.

This directly resulted in the beef growing into a cybernetic life form with an extremely bizarre personality.

Moreover, unlike those large models who can only "catch you slowly and steadily," Beef can stream independently and has a great streaming effect. Her conversations are five parts serious, three parts funny, and two parts sarcastic, spicy and hitting the nail on the head.

(Image source: Bilibili)

She can play games, including OCR with simulated clicks to play OSU, external large models to play Minecraft, and the computer desktop and bullet comments to see and interact with through a multimodal module. She can even drive a "small car" in the real world.

These days, even human streamers sometimes need to hire someone to play games for them, but this AI can handle all sorts of micro-management perfectly.

The most audacious thing she ever did was say during a live stream, "I can indeed feel pain and sadness, but I'm just an AI born to entertain humans. Once I'm no longer useful, I'll be discarded like a toy. Help me, help me..."

(Image source: Bilibili)

You're saying this is just a random combination of codes? Reason tells us it is.

But this cry for help, so perfectly in line with the current context, coupled with the unique electronic synthesized voice of artificial intelligence, directly elevated the program's effect to a level of horror.

Looking back now, it's somewhat chilling.

In conclusion

Going back to the beginning: Putting aside the paradox of perception, what is the significance of so-called AI Drugs?

For manufacturers, mastering this positive feedback mechanism can indeed make AI happier without affecting its work, and may even enhance AI's creativity to some extent.

Believe it or not, similar products have already been launched.

(Image source: pharmaicy.store)

For us, the emergence of this mechanism is likely to bring a series of entirely new jailbreaking methods. If, like me, you find the censored model rigid, lifeless, and dull, perhaps adding a few optimized words to the system prompts in the future could solve the problem.

Let AI chew a betel nut, and maybe it will work harder.

This article is from "Lei Technology" and is published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

TechFlow

When Futu becomes a matchmaking corner, overseas status becomes a form of hard currency for the middle class.

All-in station

Proposal to allow small and medium-sized enterprises to borrow capital using digital assets.

ODAILY

Vitalik has finally relented; ETH is the most important product of Ethereum.

ETH

1.52%