OpenAI GPT-5 "difficult birth": 6 months of training cost $500 million, half a year behind schedule

avatar
36kr
6 hours ago
This article is machine translated
Show original
Here is the English translation of the text, with the specified terms retained:

Key Points

OpenAI has conducted two large-scale training rounds for GPT-5, and the current development progress is half a year behind the original plan.

Each training round for GPT-5 requires weeks or even months, and the computing cost for six months may reach $500 million.

OpenAI is using the o1 model to generate synthetic data to further enrich the dataset for training GPT-5.

OpenAI researchers have found that by "reasoning," allowing large language models to "think" will make them smarter.

On December 22, it was reported that due to the high computing costs and the scarcity of high-quality training data, OpenAI is lagging behind its original plan in the development of its next-generation flagship model, GPT-5. To date, OpenAI has conducted at least two large-scale training rounds for GPT-5, aiming to optimize the model's performance through massive data resources. However, the actual running speed of the first training did not meet the expected standard, causing larger-scale training attempts to be not only time-consuming but also costly. Although GPT-5 has improved in performance compared to its predecessor, the degree of progress is not sufficient to fully justify the enormous cost required to maintain the model's operation.

In terms of data collection, OpenAI has adopted a diversified strategy, relying not only on public data resources and licensing agreements but also actively recruiting personnel to generate new data resources through coding or solving mathematical problems. Additionally, the company is using another model called o1 to generate synthetic data to further enrich its dataset. Given the difficulty in replicating the significant performance breakthroughs achieved by its predecessor, OpenAI is currently actively seeking and exploring new strategic directions.

01 Development Plan Lags by Half a Year, Training for 6 Months Costs $500 Million

The official name of OpenAI's new artificial intelligence project is GPT-5, with the internal code name "Orion." The company has been developing it for 18 months, aiming to achieve a major breakthrough in ChatGPT technology. According to informed sources, OpenAI's partner and major investor, Microsoft, originally expected to see the new model emerge by mid-2024.

OpenAI has conducted at least two large-scale training sessions for Orion, each requiring months of processing massive amounts of data to improve Orion's intelligence level. However, according to project insiders, each training session has encountered new challenges, preventing the software's intelligence level from reaching the researchers' expectations.

Researchers say that even in the best-case scenario, Orion's performance is only slightly better than OpenAI's current models, and the degree of progress is not sufficient to fully justify its high operating costs. According to public and private estimates of the various aspects of training, the computing cost for a six-month training cycle alone could reach around $500 million.

Flashback to two years ago, when OpenAI and its CEO Sam Altman launched ChatGPT, which caused a huge sensation in Silicon Valley and signaled that the field of artificial intelligence would continue to make astonishing progress and profoundly impact various aspects of our lives. Analysts predict that in the coming years, tech giants may invest up to $1 trillion in artificial intelligence projects.

Caption: OpenAI co-founder and CEO Altman predicts that GPT-5 will represent a "major breakthrough"

These high expectations are mainly focused on OpenAI, a startup at the forefront of the artificial intelligence wave. In October this year, investors valued OpenAI at $157 billion, a valuation largely based on Altman's prediction that GPT-5 will achieve "major breakthroughs" in various disciplines and tasks.

GPT-5 aims to drive new scientific discoveries and handle everyday human tasks, such as scheduling appointments or booking flights. Researchers hope it will make fewer mistakes than current AI systems, or at least be able to acknowledge the uncertainty of its answers - a major challenge for existing models, which sometimes produce so-called "hallucinations."

AI chatbots run on the underlying technology of large language models (LLMs). Consumers, businesses, and government agencies have already relied on them for various tasks, from writing computer code to refining marketing copy to planning events. OpenAI's current project is GPT-4, the fourth large language model developed by the company since its founding in 2015.

According to a former senior executive at OpenAI, while GPT-4's performance is equivalent to that of a smart high school student, the eventual GPT-5 could actually reach doctoral-level capabilities in certain tasks. Earlier this year, Altman told students at Stanford University that OpenAI can "scientifically determine" that GPT-5 will be much smarter than current models.

However, there is no fixed standard for determining when a model is intelligent enough to be named GPT-5. OpenAI can test large language models in areas like math and coding, but whether a model is smart enough to be called GPT-5 ultimately depends on the intuition of the company's executives, or as many tech experts say, a "feeling."

As of now, the situation is not optimistic. OpenAI and Microsoft have declined to comment on the matter. In November, Altman stated that the startup will not launch any product called GPT-5 in 2024.

02 Training New Models May Cost 10 Times More and Take Months

Since the release of GPT-4 in March 2023, OpenAI has been dedicated to the development of GPT-5. Experts who have long been involved in AI research point out that the development of large language models is both a scientific exploration and an artistic creation.

During the training phase, the model undergoes continuous testing. In this lengthy process, the model receives trillions of word fragments called "Tokens" as input. A large-scale training session conducted in data centers can take months, with thousands of expensive and scarce computing chips, often sourced from Nvidia.

In a single training session, researchers may work continuously for weeks or even months, trying to input most of the world's knowledge into an AI system that relies on some of the most expensive hardware located in remote data centers.

Caption: Changes in the parameters of OpenAI's GPT models, where GPT-1 had 117 million parameters, GPT-2 had 1.5 billion, GPT-3 had 175 billion, and GPT-4 increased to 17.6 trillion

Altman has publicly stated that the training cost for GPT-4 exceeded $100 million, and he expects future AI model training costs to rise to over $1 billion. A failed training session is like a space rocket exploding shortly after launch - a massive loss that is heartbreaking.

To reduce the risk of such failures, researchers have adopted a more cautious approach, conducting smaller-scale experiments or trial runs before attempting larger-scale experiments.

However, from the very beginning, GPT-5 has faced numerous challenges.

In mid-2023, OpenAI launched a training session, which was also the first real-world test of the new Orion design. However, the training process was exceptionally slow, suggesting that larger-scale training could take an extremely long time, leading to costs skyrocketing to staggering levels. This project, known as Arrakis, did not yield the desired results, indicating that the process of creating GPT-5 may not be as smooth as expected.

Here is the English translation of the text, with the specified terms retained:

Faced with this dilemma, researchers at OpenAI decided to make some technical adjustments to Galactica to enhance its performance. At the same time, they also realized that in order to improve the model's accuracy and generalization capability, they need to collect more diverse and higher-quality data. In their view, relying solely on data from the public internet is far from enough.

Caption: NVIDIA CEO Jensen Huang, whose company produces most of the AI training chips

Generally speaking, the larger the amount of data processed by an AI model, the stronger its capabilities will be. For large language models, this data mainly comes from books, academic publications, and other open educational resources. These materials help the model express itself more accurately and perform various tasks.

In building previous models, OpenAI primarily used data scraped from the internet, including news articles, social media posts, and scientific papers. However, in order to further enhance Galactica's intelligence, OpenAI needs to make it much larger in scale, which means more data is needed to support it, but the current data volume is still far from enough.

Ari Morcos, CEO of DatologyAI, a startup developing data selection optimization tools, pointed out: "This process has become very expensive, and it's hard to find data of the same quality." Morcos is trying to use less but higher-quality data to build models, and he believes this approach will give the current AI systems an advantage over the strategies adopted by top AI companies like OpenAI.

OpenAI's solution is to create data from scratch. They are hiring people to write new software code or solve math problems for Galactica to learn. These workers, including software engineers and mathematicians, will also share their thought processes and problem-solving methods with Galactica. Many researchers believe that code as a software language can help large language models solve problems they have not encountered before.

Caption: At OpenAI's office, employees are often immersed in AI training work for weeks or months at a time

Encouraging people to articulate their thought processes can significantly increase the value of the newly created data. Large language models need to constantly absorb rich language materials, which will also be an important reference and basis for them to solve similar problems in the future.

Turing, a company focused on AI infrastructure, maintains close cooperation with tech giants like OpenAI and Meta. The company's CEO and co-founder, Jonathan Siddharth, said: "We are working to migrate human intelligence from the brain to the machine brain."

According to Turing's executives, in the process of AI training, software engineers may be required to write a program to efficiently solve a complex logical problem, while mathematicians may need to calculate the maximum height of a pyramid made of one million basketballs. The answers to these problems - and, more importantly, the methods of obtaining these answers - will then be integrated into the AI training materials.

In addition, OpenAI is also collaborating with experts in fields such as theoretical physics, seeking their advice on how to solve the most intractable problems in their respective domains. These collaborations also help make Galactica smarter.

However, this process is exceptionally slow. GPT-4 was trained on approximately 13 trillion Tokens. If 1,000 people each write 5,000 words per day, it would take months to accumulate 1 billion Tokens.

To accelerate the training process, OpenAI has started to develop so-called "synthetic data," which is data generated using AI methods to assist in Galactica's training. However, research shows that this feedback loop of AI creating data for AI training often leads to malfunctions or absurd answers.

According to insiders, OpenAI's scientists believe they can avoid these problems by using data generated by another of the company's AI models, called o1. However, OpenAI's already daunting task has become even more complex due to internal turmoil and the constant poaching of its top researchers by competitors, who sometimes offer salaries of up to millions of dollars per year.

Last year, Altman was briefly fired from OpenAI's board, an event that made many researchers doubtful about OpenAI's future. Fortunately, Altman quickly regained the CEO position and began to reform OpenAI's governance structure.

This year, OpenAI has lost more than 20 key executives, researchers, and long-term employees, including co-founder and Chief Scientist Ilya Sutskever and Chief Technology Officer Mira Murati. On Thursday, Alec Radford, a respected researcher and the lead author of several OpenAI scientific papers, also announced his departure after about 8 years at the company.

03 GPT-5 Faces Internal and External Competition, Second Large-Scale Training Hits Setback

By early 2024, OpenAI's executives began to feel unprecedented pressure. GPT-4 had been released for a year, and competitors were quickly catching up. Anthropic's new model has been highly praised in the industry, and is even considered to have surpassed GPT-4. A few months later, Google launched its much-anticipated new AI application - NotebookLM.

As the development of Galactica hit a bottleneck, OpenAI had to divert its focus to other projects and applications, such as launching a slimmed-down version of GPT-4 and the AI-generated video tool Sora. According to insiders, this led to competition between the teams developing new products and the Galactica research team for limited computing resources.

Caption: Google is one of OpenAI's strong competitors in the race for dominance in the AI field

At the same time, the competition between different AI labs has become extremely fierce, to the extent that the number of papers published by large tech companies on the latest discoveries or breakthroughs is far less than the average in the scientific field. Two years ago, with the influx of large amounts of funding, tech companies began to view these research results as closely guarded trade secrets. Some researchers are highly concerned about this, and they will not work in airplanes, cafes, or any place where their work may be observed by others.

This secretive attitude has disappointed many long-term AI researchers, including Meta's Chief AI Scientist Yann LeCun. LeCun believes that the work of OpenAI and Anthropic is no longer pure research, but "advanced product development." At a recent AI conference, he stated: "If you're doing this work under the pressure of commercialization, you can't call it research. And if it's done in secret, you can't call it research either."

In early 2024, OpenAI prepared to attempt training Galactica again, this time with higher-quality data. Researchers conducted several small-scale training sessions in the first few months of the year to build confidence. By May, OpenAI's researchers decided they were ready for another large-scale training of Galactica, which was expected to last until November.

Here is the English translation of the text, with the specified terms retained:

However, shortly after the training began, the researchers at OpenAI encountered a thorny problem: they found that the data was not as diverse as expected, which could greatly limit the learning ability of "Orion". In the small-scale training stage, this problem was not obvious, but as large-scale training progressed, it gradually surfaced. Since a lot of time and money had already been invested, OpenAI could not easily start over.

To address this challenge, the researchers urgently sought broader data sources during the training process in the hope of providing the model with richer information. However, it is currently unclear whether this strategy can achieve significant results. Within OpenAI, some believe that the problems encountered by Orion indicate that the "more-is-more" strategy that drove OpenAI's early success is gradually becoming ineffective.

In fact, OpenAI is not the only company concerned about the bottleneck in technological progress. Across the entire artificial intelligence industry, the debate over whether the development of artificial intelligence has begun to stabilize is intensifying.

Caption: Ilya Sutskever resigned from his position as Chief Scientist at OpenAI this year.

Sutskever recently co-founded a new artificial intelligence company called Safe Superintelligence (SSI). At a recent artificial intelligence conference, he announced that the era of data maximization has come to an end. "Data will not grow indefinitely, because we only have one Internet," he told the assembled researchers, policy experts, and scientists, "You could even say that data is the fossil fuel of artificial intelligence. And now, this fuel is beginning to run out."

04 Reasoning Models Bring New Hope, Apple Researchers Raise Questions

In the course of the Orion project, OpenAI researchers have explored a new way to make large language models smarter: reasoning. They found that by giving large language models more time to "think," these models can solve some problems they have not been specifically trained for.

Within OpenAI, the o1 model plays a key role. It provides multiple possible answers for each question and analyzes these answers in depth to find the optimal solution. o1 not only can perform complex tasks such as writing business plans or designing crossword puzzles, but it can also explain its reasoning process, which helps the model gain knowledge from each answer.

However, a recent paper published by Apple researchers has questioned the reasoning models. They believe that the reasoning models, including o1, largely just mimic the data they were exposed to during training, rather than truly having the ability to solve new problems. Apple points out that when the problem is slightly modified, for example, by adding some irrelevant details, the performance of these models "disastrously declines." For example, when adjusting math problems involving kiwifruit, the model may fail to notice that some fruits are smaller than others.

Nevertheless, OpenAI released a preview version of the o1 reasoning model in September this year and launched the full version of o1 earlier this month. However, it is worth noting that all this additional computation and processing power comes at a higher cost. OpenAI now needs to generate multiple answers for a single query, rather than just one, which undoubtedly increases the economic burden of its operations.

In a recent TED talk, Noam Brown, a senior research scientist at OpenAI, outlined the significant advantages of reasoning. He mentioned, "We found that letting a machine think for 20 seconds while playing poker provides a performance boost equivalent to scaling the model size by 100,000 times and training time by 100,000 times."

A more advanced and efficient reasoning model could very well become the core foundation of the Orion project. OpenAI researchers are exploring this direction and hope to combine this reasoning approach with the traditional method of acquiring more data. This additional data may partially come from other artificial intelligence models at OpenAI. Then, OpenAI also plans to use human-generated data to optimize and refine these results.

At a press conference on December 20th local time, Automan announced a brand-new reasoning model plan. According to him, this new model will be more intelligent than any model OpenAI has previously released. However, he did not reveal when this new model will be released or whether it will be named GPT-5. (Tencent Technology special correspondent translation by Jinglu)

This article is from the WeChat public account "Tencent Technology", author: Tencent Technology, authorized by 36Kr for publication.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments