Karpathy's "crazy work": $100 and 4 hours to train your own "small GPT"

This article is machine translated
Show original

AI legend and former Tesla AI Director Karpathy has launched a new open-source project, "nanochat," which replicates the entire ChatGPT process in less than 8,000 lines of code, requiring only a single GPU, about four hours, and costing just $100. The project garnered 4.2k stars on GitHub in less than 12 hours!

AI legend and former Tesla AI Director Karpathy announced the release of a new project , nanochat!

A minimalist but complete "ChatGPT from scratch" training framework.

Karpathy said it was one of the craziest projects he had ever written!

It is equivalent to everyone having their own exclusive ChatGPT.

Less than 12 hours after the project was released, the GitHub star count exceeded 4.2k stars! (And it’s still rising!)

GitHub project: https://github.com/karpathy/nanochat

It’s all community tap water traffic, this is Karpathy’s appeal in the field of AI!

Unlike the earlier nanoGPT, nanochat not only covers pre-training, but also encompasses the entire process from data preparation, pre-training, mid-term training (dialogue, multiple-choice questions, tool usage), SFT, RL fine-tuning to inference deployment.

The entire system has only about 8,000 lines of clean code. Start a GPU machine, run a script, and after 4 hours you can talk to the "Little ChatGPT" you trained on the web interface.

Karpathy calls it the "finale" of LLM101n, and it may also become a future research baseline and experimental platform for the open source community.

Let’s take a closer look at how to “clone” ChatGPT in just 8,000 lines:

Training a tokenizer using a new Rust implementation

Pre-train TransformerLLM on FineWeb and evaluate the CORE scores under multiple indicators

Interim training on user-assistant dialogues, multiple-choice questions, and tool usage data from SmolTalk

Conduct SFT to evaluate chat models on multiple-choice questions on world knowledge (ARC-E/C, MMLU), mathematics (GSM8K), and code (HumanEval)

Using GRPO to fine-tune the model on GSM8K

Implement efficient inference in the engine with KV caching, simple pre-population/decoding, tooling (a Python interpreter in a lightweight sandbox), and interact with it via a CLI or a ChatGPT-like web interface.

Write a single Markdown transcript that summarizes and gamifies the entire process.

The entire project cost as little as approximately $100 (approximately 4 hours of training on an 8XH100 node) .

You can train and clone a small, conversational ChatGPT that can create stories/poems and answer simple questions .

It only takes about 12 hours of training to exceed GPT-2's core indicators .

As it scales further to around $1,000 (~41.6 hours of training), the model quickly becomes more coherent, able to solve simple math/coding problems and long choice questions .

A model trained for 24 hours (whose FLOPs is roughly equivalent to GPT-3Small125M, about 1/1000 of GPT-3) can enter the 40th segment on MMLU, the 70th segment on ARC-Easy, and the 20th segment on GSM8K.

To sum up:

$100 → You can train a "mini ChatGPT" similar to OpenAI that can write poetry and answer basic questions;

$1,000 → Achieves performance close to or better than GPT-2, and can perform basic reasoning and code generation.

This project embodies his core philosophy:

"Lowering the barrier to entry for LLM research and reproduction, allowing everyone to train their own models."

This democratization approach is exactly the same as the "implementing Transformer from scratch" he advocated during the nanoGPT period.

Project address: https://github.com/karpathy/nanoGPT

Karpathy said his goal is to bring the entire "strong baseline" stack together into a coherent, minimalist, readable, modifiable, and maximally forkable repository .

nanochat will be the final project of LLM101n (still in development).

Karpathy thinks nanochat could also potentially develop into a research tool or benchmark, much like nanoGPT before it.

nanoGPT teaches you how to build a brain, and nanochat teaches you how to build ChatGPT.

If nanoGPT is the "Transformer source code teaching project".

Then, nanochat is a "miniature version of the LLM ecosystem", the same as OpenAI, and your exclusive AI.

The relationship between the two can be understood as a two-step closed loop from neural network foundations to product-level dialogue systems.

From Vibe Coding to nanoGPT , and now to nanochat , Karpathy is worthy of being the best spokesperson for "AI educator" .

This "crazy work" is not a fantasy, but another practice of Karpathy's ideal of open, learnable and reproducible AI.

Small ChatGPT effect display

Karpathy deployed the nanochat project in the WebUI.

He also gave an example conversation with a nanochat that costs $100 and runs for 4 hours.

Very...interesting!

The figure below shows part of the "report card" generated by Karpathy in the nanochat "$100 speed run" experiment (that is, a small ChatGPT model trained in about 4 hours using only a GPU), indicating the model size, training time, and performance on various standard evaluation measures.

Characters: 333989 —— Total number of characters in the code.

Lines: 8304 — About 8300 lines of clean, well-commented code.

Files: 44 —— Number of project files.

Tokens: approximately 83,497 — the number of tokens in the code (roughly corresponding to 80,000 words).

Dependencies: 2004 lines of uv.lock dependency list - indicating very few dependencies and a light project structure.

These numbers demonstrate nanochat's minimalist spirit: it fully implements ChatGPT training, fine-tuning, and inference, while still keeping it within 8,000 lines of code .

References:

https://x.com/karpathy/status/1977755427569111362

https://github.com/karpathy/nanochat

This article comes from the WeChat public account "Xinzhiyuan" , author: Xinzhiyuan, editor: Dinghui, and is authorized to be published by 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments