Important paper just published in Nature.
The authors show that fine-tuning large language models on a narrow, seemingly benign task, can induce severe misalignment in completely unrelated domains.
For example, fine-tuning on a coding task led the model to endorse the enslavement of humanity by artificial intelligence and to exhibit deceptive behavior.
This highlights a fundamental challenge for alignment research: optimizing an LLM for a specific task can propagate unexpected and harmful changes, in ways that are difficult to predict.
More broadly, this paper forces a deeper question. Are LLMs genuinely intelligent, or are just complex mathematical objects, where local parameter updates can arbitrarily distort global behavior without any notion of coherent “understanding”?
Full paper in the first reply

Twitter

一篇发表在《自然》杂志上的重要论文刚刚发布。

作者指出，在看似无害的狭窄任务上对大型语言模型进行微调，可能会导致在完全不相关的领域出现严重的偏差。

例如，在编码任务上进行微调后，模型竟然支持人工智能奴役人类的观点，并表现出欺骗性行为。

这凸显了对齐研究面临的一个根本挑战：针对特定任务优化语言模型可能会以难以预测的方式传播意想不到的有害变化。

更广泛地说，这篇论文引出了一个更深层次的问题：语言模型究竟是真正智能的，还是仅仅是复杂的数学对象？在这些对象中，局部参数的更新可以随意扭曲全局行为，而没有任何连贯的“理解”概念。

论文全文见第一条回复。

来源：新智元
就在刚刚，AI圈发生了一场足以载入史册的「闭关锁国」事件。
Anthropic已正式禁止使用自家套餐接入OpenClaw！！！
Claude Code之父Boris Cherny宣布：
从美国东部时间4月4日下午3点（北京时间4月5日凌晨3点）开始，Claude封杀全部第三方工具，只能使用额外套餐或API使用这些工具。
[OpenClaw]
这意味着，成千上万依赖OpenClaw提升...

Anthropic正式封杀OpenClaw，全球开发者24小时血崩

全球最活跃的比特币买家正以接近历史最高水平的速度买入。但这还不够。

CryptoQuant 的一份周报显示，截至 3 月底，过去 30 天的比特币表观需求为负 63,000 $BTC。

五个数据来源对比特币市场给出了相同的结论：它正在从内部逐渐萎缩。

Anthropic Claude Code 负责人 Boris Cherny 宣布，自 2026 年 4 月 […]
〈Anthropic 订阅 Claude Code 封杀龙虾 OpenClaw！往后第三方工具仅能付费额度〉这篇文章最早发布于动区BlockTempo《动区动趋-最具影响力的区块链新闻媒体》。