OpenAI releases the most powerful model o1, breaking the bottleneck of AI and opening a new era. GPT-5 may never come

avatar
36kr
09-13
This article is machine translated
Show original

Without any warning, OpenAI suddenly released the OpenAI o1 series of models. According to the official technical blog, o1 represents the strongest level of artificial intelligence in terms of reasoning ability.

OpenAI CEO Sam Altman said: "OpenAI o1 is the beginning of a new paradigm: AI that can perform general complex reasoning."

This new model represents a new level of AI capability on complex reasoning tasks. Based on this, OpenAI has chosen to rename this series to OpenAI o1 and start counting from the beginning.

I don’t know if this means that the name GPT-5 will not appear.

Briefly summarize the characteristics of the new model:

OpenAI o1: Powerful performance, suitable for handling complex tasks in various fields.

OpenAI o1 mini: Cost-effective and suitable for application scenarios that require reasoning but do not require extensive world knowledge.

Now, the model has been fully pushed and you can access it through the ChatGPT web page or API.

o1-preview is still a preview version, and OpenAI will continue to update and develop the next version. Currently, there is a certain number of usage restrictions, 30 messages per week for o1-preview and 50 messages per week for o1-mini.

Like the rumored "Strawberry", the new model is able to reason about complex tasks and solve more difficult problems in science, coding, and mathematics than ever before. Officials say these enhanced reasoning capabilities will be particularly useful if you need to solve complex problems in science, coding, mathematics, and other fields.

For example, medical researchers can use it to annotate cell sequencing data, physicists can use it to generate complex quantum optics formulas, and developers can use it to build and execute multi-step workflows.

In addition, the OpenAI o1 series excels at generating and debugging complex code.

In order to provide developers with more efficient solutions, OpenAI also released a faster and cheaper inference model, OpenAI o1-mini, which is particularly good at encoding.

As a smaller version, o1-mini costs 80% less than o1-preview and is a powerful and efficient model suitable for application scenarios that require reasoning but do not require extensive world knowledge.

During the specific training process, OpenAI trains these models to think deeply before answering questions. o1 generates an internal chain of thoughts before answering questions, which enables it to make deeper reasoning.

Through training, the OpenAI o1 model learns to refine its way of thinking and continues to improve with more reinforcement learning (training time calculations) and more thinking time (test time calculations).

OpenAI researcher @yubai01 also pointed out 01's training route:

We use RL to train a more powerful inference model. So excited to be part of this journey, and it’s a long way to go!

It is reported that in tests, the model performed like a doctoral student in tasks such as physics, chemistry and biology, and performed particularly well in mathematics and coding.

In the qualifying exam for the International Mathematical Olympiad (IMO), GPT-4o solved only 13% of the problems, while the reasoning model scored 83%. In the Codeforces programming competition, its performance entered the top 89%.

However, as rumored, as an early version, the model does not yet have some common features of ChatGPT, such as multimodal capabilities such as web browsing and uploading files or images.

In contrast, GPT-4o will be more competent in many common application scenarios.

To ensure the safety of the new model, OpenAI proposed a new safe training method.

In the most rigorous "jailbreak" test, GPT-4o scored 22 (out of 100), while the o1-preview model scored 84, far ahead in terms of security.

Starting next week, ChatGPT Enterprise and Edu users will also have access to these two models. Eligible developers can now use these two models through the API, with a limited per-minute rate.

Here is an important point. OpenAI said that in the future, it will provide access to o1-mini to all ChatGPT free users. However, there will probably be some restrictions on the number of times.

We will share more details about the new model o1 with you soon after a more detailed experience. If you have any questions of interest, please let us know in the comment area.

The reasoning ability is far ahead, but still can't tell "which is bigger, 9.11 or 9.8"

The official also released more demonstration videos of OpenAI o1.

For example, use OpenAI o1 to write a web game about finding squirrels. The goal of this game is to control a koala to avoid the increasing number of strawberries and find the squirrel that appears after 3 seconds.

Unlike traditional classic games such as Snake, the logic of this type of game is relatively complex, which tests OpenAI o1's logical reasoning ability even more.

Or, OpenAI o1 has begun to solve some simple physics problems through reasoning.

The demonstration gave an example of a small strawberry placed in an ordinary cup, which was turned upside down on a table, and then the cup was picked up and asked where the strawberry would be and to explain the reasoning process. This shows that the model can understand the changes in the position of an object in different physical states.

When applied to specific situations, OpenAI o1 can also become a powerful assistant to doctors, such as helping doctors organize and summarize case information, and even assisting in the diagnosis of some difficult and complicated diseases.

Mario Krenn, a quantum physicist who is keen on combining AI with science, also asked OpenAI's o1 model a question about the application of a specific quantum operator. As a result, OpenAI o1 also easily solved the problem.

How many "r"s are there in "Strawberry"? GPT-4o will give the wrong answer, but OpenAI o1 can do it, which is commendable.

However, after actual testing, OpenAI o1 still could not solve the classic problem of "Which is bigger, 9.11 or 9.8", which resulted in a serious loss of points.

Regarding the arrival of OpenAI o1, Jim Fan, head of embodied intelligence at NVIDIA, said:

We are finally seeing the paradigm of reasoning time scaling being generalized and put into production. As Sutton (the godfather of reinforcement learning) said in The Bitter Lesson, there are only two techniques that can scale with computation without limit: learning and search. It’s time to shift the focus to the latter.

In his opinion, many parameters in large models are used to memorize facts, which does help to "brush scores" in question-answering benchmarks, but if the logical reasoning ability is separated from the knowledge (factual memory) and a small "reasoning core" is used to call tools such as browsers and code validators, the amount of pre-training computation can be reduced.

Jim Fan also pointed out the most powerful advantage of OpenAI o1, that is, the 01 model can easily become part of the data flywheel.

In simple terms, if the model gives the correct answer, then the entire search process can be turned into a training dataset containing positive and negative rewards. Such a dataset can be used to train future versions of the model, and as the generated training data becomes more and more refined, the performance of the model will continue to improve. It is a good inner loop that trains itself through self-game.

However, netizens also found some problems in their actual tests, such as the response time being much longer. Although more time was spent thinking about some questions, problems such as irrelevant answers and incomplete output would occur.

Cyber Zen speculates that this time o1 may be the agent after some fine-tuning/alignment of GPT-4o, and the overall performance is far below expectations.

Sam Altman also admits that the o1 still has flaws and limitations, and is more impressive when you first use it, but not as good after you spend more time with it.

Despite this, the OpenAI o1 model still performs well overall.

Now, the release of the OpenAI o1 model can be regarded as the fuse of the AI model war in the second half of the year. If nothing unexpected happens, other AI companies will no longer hide their efforts.

That’s right, I’m referring to old rivals such as Anthropic, Meta AI, xAI, and some potential AI dark horses.

Moreover, since the release of GPT-4, the deepest significance of the release of OpenAI's new model is not its powerful performance, but to provide a benchmark for a technical route and lead people into unknown deep waters.

This is the case with GPT-4, and OpenAI o1 hopes to do the same.

This article comes from the WeChat public account "APPSO" , author: APPSO, and is authorized to be published by 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments