Listened to this plus Gavin’s ai thoughts post. He seems very confident in pre-training scaling laws holding and I’m just… not so sure? The argument is very focused on advancements in compute pushing pre-training but, definitionally, there needs to be commensurate increases in data to scale, right? Since we all know the famous Ilya line about pre-training data, my question is of course, where is this data coming from? It seems like people are pointing to the idea of synthetic data being fed back into pre-training, but that idea has never really sat right with me. I’ve held this intuitive sense that a model creating its own data to pre-train on should lead to a messy ouroboros of a system unable to progress. It’s learning in isolation, unexposed to novel data from different creators. BUT, I haven’t actually read any papers on the benefits or limitations of pre-training models on self-generated synergetic data. Anyone else have this thought and/or research to point to? And will note this I specifically for pre-training, not SFT, post-training, etc.

Patrick OShaughnessy
@patrick_oshag
12-09
This is my fifth conversation with @GavinSBaker.
Gavin understands semiconductors and AI as well as anyone I know and has a gift for making sense of the industry's complexity and nuance.
We discuss:
- Nvidia vs Google (GPUs + TPUs)
- Scaling laws and reasoning models
- The
From Twitter
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments
Share



