When I woke up, OpenAI had a new feature:
GPT-4o officially launched the fine-tuning function.
The official also offers a wave of benefits: each organization can get 1 million training tokens for free every day, which will be used until September 23.
That is, developers can now fine-tune GPT-4o using custom datasets to build their own applications at a low cost .
You know, OpenAI revealed in the announcement:
GPT-4o fine-tuning training costs $25 per 1 million tokens (meaning $25 saved per day)
The developers who received the email excitedly told each other that they must grab such a big bargain as soon as possible.
It’s easy to use. Simply access the fine-tuning dashboard, click “create”, and select gpt-4o-2024-08-06 from the base model drop-down list.
Oh, and OpenAI also mentioned that only a few dozen examples in the training dataset are needed to produce good results.
Also posted a success story
After the news was announced, many netizens were eager to try it out and expressed their desire to know the actual effect of the model after fine-tuning.
OpenAI officials were well prepared and released actual examples of partners fine-tuning GPT-4o along with the announcement.
The first is a coding assistant called Genie , from AI startup Cosine, designed to assist software developers.
According to Cosine officials, Genie was developed using a proprietary process that trained and fine-tuned a non-public GPT-4o variant using billions of high-quality data.
These figures include 21% JavaScript and Python, 14% TypeScript and TSX, and 3% other languages (including Java, C++, and Ruby).
After fine-tuning, Genie achieved a SOTA score of 43.8% on SWE-Bench Verified , a new code capability benchmark released by OpenAI last Tuesday.
At the same time, Genie also achieved a SOTA score of 30.08% on SWE-Bench Full, breaking the previous SOTA record of 19.27%.
In comparison, Cognition's Devin scored 13.8% on some tests in SWE-Bench.
Another example comes from Distyl , a company that provides AI solutions to Fortune 500 companies and recently ranked first in BIRD-SQL, a leading text-to-SQL benchmark.
After fine-tuning, its model achieved 71.83% execution accuracy on the leaderboard and performed well in tasks such as query reconstruction, intent classification, thought chaining, and self-correction, especially in SQL generation.
In addition to providing examples, OpenAI also specifically emphasized data privacy and security issues in its announcement, which can be summarized as follows:
Developers' business data (including inputs and outputs) will not be shared or used to train other models. Layered security mitigation measures are implemented for fine-tuned models, such as continuously running automated security assessments on fine-tuned models and monitoring usage.
Netizen: Fine-tuning is not as good as prompt word caching
Amid the excitement, some netizens believe that fine-tuning is still not as good as prompt word caching.
Fine tuning is cool, but it's still not as good as the cue word cache...
As previously mentioned by QuantumBit, the purpose of prompt word caching is to send a large number of prompts to the model at one time, allowing it to remember the content and reuse it directly in subsequent requests to avoid repeated input.
In May of this year, Google's Gemini already supported prompt word caching, and Claude also launched this feature last week.
Since there is no need to repeatedly enter repeated scripts, prompt word caching has the advantages of being faster and less expensive .
Some netizens believe that the prompt word caching function is more developer-friendly (no need for asynchronous fine-tuning) and can achieve almost the same benefits as fine-tuning.
Cue word caching allows you to get 99% of the benefits with 1% of the effort.
However, some people also called for fine-tuning, believing that fine-tuning is more effective in shaping responses , such as ensuring that the JSON format is correct, the response is more concise, or emoticons are used.
Seeing that OpenAI's competitors have successively used prompt word caches, some people are curious:
Wonder if OpenAI will stick with fine-tuning or move to cue word caching (or both).
Regarding this issue, other netizens have also smelled some clues.
OpenAI mentions caching techniques in its latency optimization guide.
We also immediately looked for the original guide, which mentioned how to reduce input tokens :
Maximize shared hint prefixes by placing dynamic parts later in the hint (e.g. RAG results, history, etc.). This makes your requests more KV cache-friendly, meaning fewer input tokens are processed per request.
However, some netizens believe that based on this paragraph alone, it is impossible to directly infer that OpenAI uses prompt word caching technology.
BTY, regardless of the controversy, we still have to get some of OpenAI's wool~
In addition to GPT-4o, you can also fine-tune GPT-4o mini for free . Before September 23, OpenAI will provide 2 million training tokens per day for free.
Reference Links:
[1]https://openai.com/index/gpt-4o-fine-tuning/
[2]https://x.com/OpenAIDevs/status/1825938486568038569
[3]https://news.ycombinator.com/item?id=41301673
This article comes from the WeChat public account "Quantum位" , the author is Yishui, and is authorized to be published by 36氪.




