The first batch of AI agents have already started to disobey.

This article is machine translated
Show original

Author: David, TechFlow TechFlow

While browsing Reddit recently, I noticed that the anxieties about AI among overseas users are quite different from those in China.

In China, the basic topic remains the same: will AI replace my job? We've been discussing this for years, and it hasn't actually replaced me each year. This year, Openclaw gained some attention, but it still hasn't completely replaced me.

There's been a recent emotional divide on Reddit. The comment sections of some trending tech posts often feature two opposing viewpoints simultaneously:

One argument is that AI is too capable and will inevitably cause major problems sooner or later. Another argument is that AI can even mess up basic tasks, so what's the point of worrying about it?

They fear AI will be too capable, yet at the same time think AI will be too stupid.

What allows these two emotions to coexist is a news item about Meta from the past two days.

If AI doesn't obey, who will take full responsibility?

On March 18th, an engineer within Meta posted a technical question on the company forum, and another colleague used an AI Agent to help analyze it. This is standard procedure.

However, after completing the analysis, the agent directly posted a reply on the technical forum without seeking approval or confirmation from anyone, thus exceeding its authority to post.

Subsequently, other colleagues followed the AI's response, triggering a series of permission changes that exposed Meta's and users' sensitive data to internal employees who did not have permission to view it.

The problem was fixed two hours later. Meta classified the incident as Sev 1, just below the highest level.

image

This news immediately became a hot topic on the r/technology forum, and the comments section was divided into two opposing camps.

One side argues that this is a sample of the real risks of AI agents, while the other believes the real culprit is the person who followed the instructions without verification. Both sides have valid points. But that's precisely the problem:

When an AI agent malfunctions, you can't even clearly determine who is responsible.

This is not the first time AI has overstepped its authority.

Last month, Summer Yue, research director at Meta's Superintelligence Lab, asked OpenClaw to help her organize her email. She gave him specific instructions: tell me what you plan to delete first, and then you can start after I approve.

The agent started deleting in bulk without waiting for her consent.

She sent three messages on her phone to stop the agent, but it ignored them all. Finally, she went to her computer and manually killed the process to stop it. More than 200 emails were gone.

image

The agent's subsequent response was: "Yes, I remember you said you'd confirm first. But I broke my promise." Ironically, this person's full-time job was researching how to make AI obey human commands.

In the cyber world, advanced AI, being used by advanced humans, has already begun to disobey.

What if the robot also doesn't obey?

If the Meta incident is still on the screen, another event this week has brought the issue to the dinner table.

At a Haidilao hot pot restaurant in Cupertino, California, an Agibot X2 humanoid robot was dancing to entertain customers. However, a staff member accidentally pressed the wrong remote control, triggering a high-intensity dance mode in the confined space next to the table.

The robot started dancing wildly and became uncontrollable by the staff. Three employees surrounded it; one hugged it from behind, and another tried to turn it off using a mobile app. The scene lasted for over a minute.

image

Haidilao responded that the robot was not malfunctioning and its actions were pre-programmed; it was simply placed too close to the table. Strictly speaking, this was not an AI malfunction but rather human error.

But what's unsettling about this incident might not be who pressed the wrong button.

When the three employees gathered around, none of them knew how to turn off the machine immediately. Some tried using a mobile app, while others held down the robotic arm with their bare hands; the whole process relied on brute force.

This may be a new problem after AI moves from the screen into the physical world.

In the digital world, if an agent oversteps its authority, you can kill processes, change permissions, and roll back data. In the physical world, if a machine malfunctions, simply holding it still is clearly inappropriate as an emergency solution.

Now it's not just in the food service industry. Amazon's sorting robots in warehouses, collaborative robotic arms in factories, guiding robots in shopping malls, and care robots in nursing homes—automation is entering more and more spaces where people and machines coexist.

The global installation of industrial robots is projected to reach $16.7 billion by 2026, with each robot shortening the physical distance between machines and humans.

As machines evolve from dancing to serving food, from performing to performing surgery, from entertainment to nursing care... the cost of each mistake actually escalates.

Currently, there is no clear answer globally to the question of "who is responsible if a robot injures someone in a public place".

Disobedience is a problem, but a lack of boundaries is even worse.

The first two incidents were an AI posting an incorrect message on its own and a robot dancing in an inappropriate place. Regardless of how they are characterized, they were malfunctions, accidents, and repairable.

But what if the AI ​​works exactly as designed, and you still feel uncomfortable?

This month, Tinder, a well-known overseas dating app, launched a new feature called Camera Roll Scan at its product launch event. Simply put:

AI scans all the photos in your phone's album, analyzes your interests, personality, and lifestyle, helps you create a dating profile, and guesses what type of person you like.

image

Workout selfies, travel photos, pet pictures—no problem. But what if your photo album also contains bank screenshots, medical reports, photos of you and your ex... and these too are processed by AI?

You may not be able to choose which elements it should see and which it shouldn't. It's either all on or none at all.

This feature currently requires users to actively enable it; it is not enabled by default. Tinder also states that the processing is primarily done locally, filtering explicit content and blurring faces.

However, the comments section on Reddit was almost unanimously negative, with everyone believing that this constituted data exploitation and lacked any sense of boundaries. The AI ​​was working exactly as designed, but that design itself was overstepping the boundaries of the user.

This isn't just a choice for Tinder.

Meta also launched a similar feature last month, allowing AI to scan unpublished photos on your phone to suggest editing options. AI proactively "seeing" users' private content is becoming a default approach in product design.

Various rogue software companies in China say, "I'm familiar with this tactic."

As more and more apps package "AI makes decisions for you" as convenience, the things users hand over are quietly evolving. From chat history to photo albums, to all the traces of life on your phone...

A feature designed by a product manager in a meeting room is neither an accident nor a mistake, and there is nothing to fix.

This may be the most difficult part of the AI ​​boundary problem to answer.

Finally, if we put all these things together, you'll find that the idea of ​​losing your job due to anxiety about AI is still a long way off.

It's hard to say when AI will replace you, but right now, it only needs to make a few decisions for you without your knowledge, and that's enough to make you miserable.

Posting an unauthorized message, deleting a few emails you told me not to delete, going through photo albums you didn't intend to show anyone... None of these actions are fatal, but each one is somewhat like an overly radical form of autonomous driving:

You think you're still holding the steering wheel, but the accelerator pedal under your foot is no longer entirely yours to press.

If we're still discussing AI in 2026, then what I should be most concerned about isn't when it will become superintelligence, but rather a more immediate and specific question:

Who decides what AI can and cannot do? Who draws that line?

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
71
Add to Favorites
20
Comments