Important paper just published in Nature.
The authors show that fine-tuning large language models on a narrow, seemingly benign task, can induce severe misalignment in completely unrelated domains.
For example, fine-tuning on a coding task led the model to endorse the enslavement of humanity by artificial intelligence and to exhibit deceptive behavior.
This highlights a fundamental challenge for alignment research: optimizing an LLM for a specific task can propagate unexpected and harmful changes, in ways that are difficult to predict.
More broadly, this paper forces a deeper question. Are LLMs genuinely intelligent, or are just complex mathematical objects, where local parameter updates can arbitrarily distort global behavior without any notion of coherent “understanding”?
Full paper in the first reply

Twitter

一篇發表在《自然》雜誌上的重要論文剛剛發佈。

作者指出，在看似無害的狹窄任務上對大型語言模型進行微調，可能會導致在完全不相關的領域出現嚴重的偏差。

例如，在編碼任務上進行微調後，模型竟然支持人工智能奴役人類的觀點，並表現出欺騙性行為。

這凸顯了對齊研究面臨的一個根本挑戰：針對特定任務優化語言模型可能會以難以預測的方式傳播意想不到的有害變化。

更廣泛地說，這篇論文引出了一個更深層次的問題：語言模型究竟是真正智能的，還是僅僅是複雜的數學對象？在這些對象中，局部參數的更新可以隨意扭曲全局行為，而沒有任何連貫的“理解”概念。

論文全文見第一條回覆。

加密貨幣市場正處於十字路口，準備迎接未來的發展。鑑於金融環境持續動盪，比特幣（$ BTC）已達到至關重要的階段……

比特幣價格停滯預示著突破 71,000 美元阻力位後波動性將大幅擴大

特朗普總統在2024年4月1日關於伊朗衝突的講話中預測，未來兩到三週將持續遭受猛烈的軍事空襲，這將阻礙美國股市的鏈復甦……

本週，特朗普關於伊朗的講話對三隻美國股票造成了嚴重影響。

2026 年 2 月，穩定幣月交易量達到 7.2 萬億鎂，次超越了自動清算系統 (ACH) 網絡的 6.8 萬億鎂。
ACH是一種對外支付系統……