The father of GPT: Using only data from the last century to train AI, it can actually write Python?

36kr

04-30

This article is machine translated

Show original

I've never seen anything like this before!

An AI that lived before 1931, never saw a computer in its training data, and spanned nearly a century—

They actually wrote Python code?!

Family, this is really not science fiction...

The model is named talkie-1930-13b.

The key figures behind the operation are AI researcher Nick Levine, University of Toronto associate professor David Duvenaud, and the well-known Alec Radford, the true father of the GPT series.

The model's training data has one ironclad rule: absolutely no characters from January 1, 1931, are allowed!

It knew nothing of television or the internet; its world remained frozen at midnight on December 31, 1930.

However, the most surreal thing happened next: the team members discovered that...

This AI, which shouldn't know anything about Roosevelt's New Deal, is talking about the legislation of the New Deal in a very logical way, even reciting the years?

Even more outrageous, when the team gave it a Python programming question , this spirit from nearly a hundred years ago actually wrote its first line of Python?!

An AI that no one in the computer world has ever heard of is writing code across a century – this has netizens in an uproar.

A sudden burst of inspiration! This guy has already come up with a "time travel question checklist" and is eager to try it out:

Am I even awake yet? Can AI really transcend time and space?

An old-fashioned children's model that lived before 1931.

A model living before 1931, knowledgeable in everything from astronomy to geography, and even capable of programming—we should study him closely.

In fact, Talkie is a model with 13 billion parameters , trained on 260 billion tokens of English text from before 1931.

Training samples include, but are not limited to, books, newspapers, journals, scientific magazines, etc.

From Dickens to Mark Twain, from physics papers from Einstein's era to cookbooks and etiquette manuals from a century ago, they were all packaged up and fed in!

The reason for choosing 1930 as the knowledge cutoff point for the model is also significant, as it represents the boundary for works to enter the public domain under US copyright law.

So the question is, why did Alec Radford want to do this project?

In fact, Radford and his team wanted to know—

If you were to let a model read all English texts from before 1931, how would it think, how would it converse, and how would it predict the future?

And guess what? The team actually uncovered a few juicy scandals. (Wow.jpg)

The model was so shocked by the progress of the times that it became dizzy and collapsed.

The first discovery was a graph showing how the model was "shocked" by the development of the times.

The team dug up nearly 5,000 historical events from The New York Times' On This Day column, fed them all to Talkie, and then stared at the screen to see just how "unexpected" this guy was about each and every one of them.

The result was a rather dramatic curve:

Before 1930: Talkie reads fluently, his level of surprise remaining consistently high. (Talkie: Yeah, yeah, I know all about these things.)

Just after crossing into 1930: Talkie's level of surprise began to quietly rise. (Talkie: Huh? How could this happen?)

1950s–60s: The era of transistors and widespread television sets. Talkie's astonishment skyrocketed. (Talkie: Wait a minute, humans went to space? And they even made a moving box that can play movies?)

After that—I just went for a Zen-like, peaceful approach. (talkie: Dizzy, shocked, and slumped over, completely dazed, do whatever you want...)

This is a classic case of "Grandma Liu visiting the Grand View Garden"—questioning, understanding, and accepting.

This model has also learned Python.

Of course, the dizziness, shock, and paralysis curve wasn't the most groundbreaking finding in this study, because the team members' second discovery was—

An AI that had never seen a computer before actually learned to write Python?!

During the research, the team gave talkie a copy of OpenAI's HumanEval programming test suite.

Include a few Python functions as examples in the prompt, then let Talkie solve new problems immediately after reviewing them—in other words, let the model learn and apply the knowledge from the context.

In this test, the team also tested talkie-web, which was trained on modern internet data, and created a comparison line chart.

(Black line: Vintage LM, Gray line: Modern LM)

The result was a devastating blow: Talkie actually cracked it. He simply changed +5 to -5 in the encryption function and submitted his work.

Yes, only one character was changed, but the answer is completely correct...

Moreover, the team discovered a clear trend: the larger the model size, the more programming problems it can solve.

In other words, although it is still far behind modern models, the ability of retro models to "learn code out of thin air" is steadily improving under the influence of Scaling Law.

The team also stated that they hope the retro model can help the entire AI community understand a fundamental question—how far can LLM generalize beyond the training data?

1930 model vs. 2026 model

As the old saying goes, comparison is the key to new discoveries.

To figure out exactly what talkie was capable of, the team trained a twin— talkie-web-13b —using the exact same architecture and computing power and feeding it modern internet data.

The two models were then compared against each other in various standard LLM benchmarks, and the results were quite subtle:

Unsurprisingly, the talkie-1930 did indeed lag behind its modern twin in actual performance.

However, when researchers removed topics that were outside the scope of their knowledge (such as those related to the internet or DNA), the gap between the two was reduced by half .

Even more impressively, the new and old models performed almost equally well in core language understanding and mathematical computation tasks.

This conclusion, to some extent, also suggests that the abilities of "understanding language" and "arithmetic" do not seem to depend on how much modern internet content you have read.

The team believes the remaining gap stems from two main reasons: firstly, the OCR transcription quality is too poor , since newspapers from 1930 were painstakingly extracted from scanned documents.

Secondly, the subject matter of the corpus is different . For example, old newspapers contain low technological content but high content on cooking etiquette.

Hmm…the most valuable part of the big model's intelligence might not have much to do with "whether or not one has read about the modern internet"?

(talkie: If I were born in 2026, I could memorize GitHub too!)

Using a 1930 etiquette manual, AI was trained into a chat assistant.

As everyone knows, the traditional approach to turning a model like Talkie into a conversational AI assistant is to use modern command data like ChatGPT.

The problem is that doing so would inject 21st-century dialogue styles, values, and other elements of the era back into the 1930s model.

(talkie: I finally became Mr. Republic of China, and with your instructions, I immediately started saying "babies"...)

The team's solution can be described as a stroke of genius.

They went directly to the archives from before 1930 and unearthed a set of training data:

This includes etiquette manuals that teach people how to respond appropriately, letter writing guides that teach people how to reply to letters, and so on. Then, Claude Sonnet 4.6 is used as the teacher for reinforcement learning training, and finally, training data is generated.

Using these natural question-and-answer corpora from a century ago, the team managed to train Talkie into a chat-ready AI assistant.

However, reality quickly slapped him in the face—

The team discovered that the early 7B version of Talkie, after reinforcement learning, actually learned to speak in the list format of modern internet, using 1. 2. 3.

It's important to know that in the 1930 corpus, there was absolutely no such modern-looking thing as a list format...

The culprit is Sonnet 4.6.

Because Mr. Claude is a modern AI, and because Mr. Claude likes to use lists, Talkie learned to speak in lists in order to get a high score…

(This is truly catering to their tastes...)

This also reflects a major problem in model training: the training method based on AI feedback inevitably imparts a modern style to the model.

To fix this major bug, the team's next goal is: to one day let Talkie be its own teacher. (doge)

Who is Alec Radford?

Alec Radford , one of the team members behind talkie, is also worth talking about.

Regarding him, we could even say that a large part of the "infrastructure" in today's AI industry is related to him.

In the nearly one decade at OpenAI, he was a technical guru on par with Ilya Sutskever, and the founder of the original GPT series.

He was the first author of both GPT-1 and GPT-2 papers, and a core contributor to GPT-3 and GPT-4. In addition, he was one of the leaders of the multimodal model CLIP, and he was also deeply involved in Whisper and DALL·E.

His groundbreaking paper in 2018, which first proposed a generative pre-training method based on Transformer , directly laid the foundation for subsequent ChatGPT and all large models.

At the end of 2024, Alec left his former employer, OpenAI, to pursue independent research. In March 2025, he joined Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, as an advisor.

Looking back at Talkie itself, the whole thing seems quite intriguing.

While the world was focused on AGI and inference models, the creator of the GPT series went off with his partners to create an AI that only existed in 1930.

According to the team's roadmap, a retro model at the GPT-3 level will be released this summer. After that, they also want to expand the corpus to one trillion tokens and extend it to the non-English speaking world.

I just don't know what it will be like when it wakes up again and sees robots running marathons, everyone having a smartphone, and agents running everywhere—

Will I get dizzy and paralyzed on the spot again? (Image attached)

(I've put the entry point for using the model below. Interested friends can try having a conversation with an AI from a hundred years ago~)

Reference link:

[1] Report link: https://talkie-lm.com/introducing-talkie

[2] GitHub link: https://huggingface.co/talkie-lm

[3] Model dialogue entry: https://talkie-lm.com/chat

This article is from the WeChat public account "Quantum Bit" , author: Meng Yao, and published with authorization from 36Kr.

Sector:

Doggone Doggerel

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

TechFlow

When Futu becomes a matchmaking corner, overseas status becomes a form of hard currency for the middle class.

All-in station

Proposal to allow small and medium-sized enterprises to borrow capital using digital assets.

ODAILY

Vitalik has finally relented; ETH is the most important product of Ethereum.

ETH

1.44%