One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization.
We found that if you simply delete them after pretraining and recalibrate for < 1% of the original budget, you unlock massive context windows.

Introducing DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
https://pub.sakana.ai/DroPE/
We are releasing a new method called DroPE to extend the context length of pretrained LLMs without the massive compute costs usually associated with

Sakana AI

Twitter

我最喜欢的发现之一是：位置嵌入就像辅助轮。它们有助于模型收敛，但会损害长上下文泛化能力。

我们发现，如果在预训练后直接删除位置嵌入，并将预算调整到原预算的不到 1%，就能解锁巨大的上下文窗口。

比特币上周价格一度触及 6 万美元。在收益递减模型下，这绝不是简单的噪音。市场正在触碰整个四年周期与对数增长框架中最脆弱的环节。
当比特币周期顶部的涨幅已被大幅压缩，如果再出现历史级别的深度回调，其经典周期的吸引力将彻底失效。
这不是预测，这是数学规律。
周期顶部涨幅正在压缩
比特币各周期历史顶部：
· 2013 年：~1,242 美元
· 2017 年：~19,700 美元
· 2021 年：~...

5.5万美元，将是比特币的生死线

Citadel、Ark Invest 和 Tether 支持 LayerZero上线Zero区块链——ZRO 价格剧烈波动。图片来源：Decrypt。
今天清晨（2月11日），LayerZero官方宣布上线其名为…的全新Layer-1区块链。

Citadel、Ark Invest 和 Tether 支持 LayerZero上线Zero区块链——价格 Z...

加密市场今日下午出现急跌走势，以太坊于台湾时间下午三点跌破 2,000 美元关口，在 Hyperliquid […]
〈麻吉大哥亏惨一夜没睡？以太坊跌破2000镁慌了， 开仓做多ETH、HYPE 割肉12万镁全输光〉这篇文章最早发布于动区BlockTempo《动区动趋-最具影响力的区块链新闻媒体》。