One of my favorite findings: Positional embeddings are just training wheels. They help convergence but hurt long-context generalization. We found that if you simply delete them after pretraining and recalibrate for < 1% of the original budget, you unlock massive context windows.

Sakana AI
@SakanaAILabs
01-12
Introducing DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
https://pub.sakana.ai/DroPE/
We are releasing a new method called DroPE to extend the context length of pretrained LLMs without the massive compute costs usually associated with
From Twitter
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments
Share
Relevant content





