“This is some of the craziest code I’ve ever written.”
This Monday, AI giant Andrej Karpathy released his latest open source project, which instantly attracted the attention of the entire community.
This project, called nanochat, claims to teach you how to build ChatGPT from scratch for just $100. It covers both LLM training and inference, so by following along, you'll understand all the steps involved in building a large model.
It has 8,000 lines of code in total, and within 12 hours of being released on GitHub, it already had over 4,500 stars:
GitHub link: https://github.com/karpathy/nanochat
Unlike Karpathy's previously released nanoGPT repository (which only covered the pre-training phase), nanochat is a minimalist but complete ChatGPT clone training/inference process project implemented from scratch, with all content concentrated in a clean code base with minimal dependencies .
You just need to spin up a cloud GPU machine, run a script, and about 4 hours later you can be chatting with your own LLM in a ChatGPT-style web interface.
The repository has about 8,000 lines of code , but it already implements all of the following features:
Train a tokenizer using a brand new Rust implementation.
Pre-train the Transformer LLM on the FineWeb dataset and evaluate the CORE score on multiple metrics.
The mid-train phase trains SmolTalk on data such as user-assistant conversations, multiple-choice questions and answers, and tool usage.
Fine-tune the SFT and evaluate the model's performance on multiple-choice questions on world knowledge (ARC-E/C, MMLU), mathematics (GSM8K), and code (HumanEval).
Optional: Use GRPO for RL reinforcement training on GSM8K.
An efficient inference engine that supports KV Cache, prefill/decode inference, and tool calls (Python interpreter in a lightweight sandbox), and can be interacted with through CLI or ChatGPT-style WebUI.
Automatically generate a Markdown report card to summarize and gamify the entire training process.
Karpathy stated that for only about $100 (4 hours of training on 8×H100), you can train a "chatty" mini-ChatGPT that can write stories/poems and answer simple questions. In about 12 hours of training, it can surpass GPT-2's CORE performance.
If we scale this up to a $1,000 budget (41.6 hours of training), model coherence improves rapidly, enabling us to solve basic math/coding tasks and pass some multiple-choice tests. For example, a 30-layer model trained for 24 hours (equivalent to GPT-3 Small's 125M FLOPs, or approximately 1/1000th the scale of GPT-3) can achieve scores of 40+ on MMLU, 70+ on ARC-Easy, and 20+ on GSM8K.
Karpathy's goal is to package a complete set of "strong baseline" capabilities into a single, structured, readable, hackable, and forkable repository . nanochat will be the culmination of the LLM101n course (still under development).
Karpathy believes nanochat has the potential to evolve into a research platform or standard benchmark, similar to nanoGPT. It's still far from perfect, with no special tuning or performance optimizations (though he believes it's getting close). However, the overall framework is well-formed, making it suitable for hosting on GitHub, allowing the community to collaborate and iterate on individual modules.
Example conversation with a $100, 4-hour nanochat using the WebUI.
Below is a summary of some of the metrics produced in the report for Karpathy's $100 speedrun example.
It seems that building a large model with chat capabilities is so simple and low-cost, and with the support of Karpathy's mature open source code, is it feasible for us to create our own personalized model to assist in our work?
Some netizens raised questions that everyone is concerned about:
But Karpathy poured cold water on such applications, saying that it was not a code suitable for personalization purposes.
Karpathy argues that this tiny model should be thought of more like a very young child, without that much raw intelligence. If you fine-tune/train it on your own data, you might get some interesting parrot-like effects that feel like you wrote in the style, but it will be a mess.
To achieve the effect of personalized model, the following steps are generally required:
Prepare raw data
Generate and rewrite a lot of synthetic data based on this (complex, not obvious, needs research)
Use this data to fine-tune a currently powerful open source model (such as Tinker)
When fine-tuning, you may also need to mix in a lot of pre-training data to avoid the model losing too much general intelligence capabilities.
It can be said that to truly make this plan effective, it is still a matter of scientific research.
For more detailed technical introduction and step-by-step construction examples, please refer to the following links:
https://github.com/karpathy/nanochat/discussions/1
This article comes from the WeChat public account "Machine Heart" , the author is Machine Heart who focuses on AI, and is authorized to be published by 36Kr.