GPT-4o is powerful, but not world-changing
Some time ago, OpenAI released GPT-4o ("o" stands for "omni"). In simple terms, it can take any combination of text, audio, and images as input, and generate any combination of text, audio, and image output. And it has many amazing applications.
The main upgrade of GPT-4o is that it can connect any text, audio and image inputs, and they can be directly generated from each other without the need for intermediate conversion. In addition, the voice delay of GPT-4o is greatly reduced, and it can respond to audio input within 232 milliseconds, with an average of 320 milliseconds, which is similar to the response time of humans in conversation.
GPT-4o can not only understand what is happening in the camera through visual AI capabilities, but also communicate the content it understands with the old version of ChatGPT through voice, making the interaction richer and more interesting. It also supports mid-way interruption and dialogue insertion, and has context memory capabilities.
Moreover, GPT-4o has an advantage over GPT-4 for the majority of users: it is free. Experience GPT-4-level intelligence, get responses from models and networks, analyze data and create charts, chat about the photos you took, upload files to get summaries, help with writing or analysis, use GPTs and GPT Store, and build a more helpful experience through Memory. All these features are available for free. You should know that the GPT-4 Plus version is expensive, has trouble paying, and has a high threshold for use, which makes most people shy away. The biggest highlight of GPT-4o is that it allows more people to use it.
However, only the GPT-4 Plus version of GPT-4o can be used now. Other versions need to be opened later. The number of free GPT-4o is limited. After the number is exceeded, it will be directly converted to GPT-3.5.
Some industry insiders believe that, especially in the preview, OpenAI CEO Altman even used the word "magic". Compared with the magical powers of GPT-4 and Sora's magical brush, GPT-4o is obviously not magic. Moreover, from the perspective of multimodal capabilities, GPT-4o's capabilities have not been significantly improved over the previous generation. Even compared with GPT's old rival Anthropic's Claude 3, there is no gap. It can be said that from the perspective of model capabilities, there is no essential difference between GPT-4o and GPT-4.
Therefore, the release of GPT-4o is more like a head start, showing a leading posture, and it is also a way to maintain popularity and stimulate purchasing demand.
Interestingly, just 24 hours after the release of GPT-4o, Google also released products that seemed to be a challenge to Open AI. Google CEO Sundar Pichai released dozens of Google and AI products, which can be called a "family bucket" level, and comprehensively encircled Open AI. These include Gemini 1.5 Pro and Gemini 1.5 Flash that support 2 million token long texts, Veo that competes with Sora, open source model Gemma 2, AI Overviews that supports generative search, and the sixth generation TPU.
Google CEO Sundar Pichai
The biggest highlight of the entire developer conference was Astra, the AI voice assistant launched by Google, which can recognize objects, codes and various things through the camera. In the live demonstration video, the user asked Astra to tell her when she saw something making a sound, and the assistant replied that it could see a speaker making a sound. As for the Apple that flashed by, Astra was able to accurately answer that it was next to the glasses. In addition to Astra, Google also launched a number of general AI Agent sub-series products based on Gemini. Such as NotebookLM for audio, Music AI Sandbox for music, Veo for video, and Imagen 3 for images, which are directly targeted at GPT-4o, Dall-E and Sora released by OpenAI.
But the problem is similar to OpenAI, which is that it is not enough for developers to build more native and killer applications, from reasoning capabilities to multimodal capabilities. The two companies are more like a competition of chasing each other in arithmetic, and neither of them has pulled away too much, so it is naturally difficult to contribute applications that shock the world.
It is no wonder that Musk said after watching the press conference that the GPT-4o demonstration made him feel "uncomfortable and embarrassed." Andrej Karpathy also gave a technical summary in a very calm tone, which was seconded by Musk: What they released was a model that combined text, audio and video modalities in the same neural network and processed them simultaneously, that's all.
Even large models need to be carefully calculated
Last year, the technology boom of generative AI and large language models swept the global technology circle. Both technology giants and emerging unicorns were scrambling to develop larger and more powerful models, which triggered an arms race around AI chips and made Nvidia, an AI arms dealer, earn $34 billion more than the previous year.
But the situation this year is obviously not so optimistic, and a pragmatic and cautious style has spread throughout the technology circle. The Information, a technology media, reported that "cloud vendors including Microsoft, Amazon and Google and other companies selling this technology (referring to generative AI) are lowering their expectations." Some people are already worried that the bubble blown by generative AI may be too big. It is the future, but it may not be now, just like the Internet is now a trillion-dollar business, but it does not prevent the explosion of the Internet bubble blown in the millennium.
There are two versions of OpenAI's revenue last year. The Information said that its annualized revenue in the last month of 2023 was $1.6 billion, while the Financial Times gave a figure of $2 billion. This income level is definitely the first echelon in the AI industry, but compared with the no less than $1 billion Microsoft provides to OpenAI every year, it is still not much, not to mention Sam Altman's ambitious plan to build a chip factory with $7 trillion to create a hardware and software integration? Perhaps going public can solve OpenAI's investment problem, but as a non-profit organization, OpenAI still has many problems in turning into a normal private profit-making company, which is not realistic in the short term.
A large number of US generative AI companies that were just promoted to unicorns in 2023 have fallen into the dilemma of difficult to realize their ideals. The two co-founders of Inflection, a startup that was once ranked in the top 3 AI unicorns, jumped to Microsoft. Because Microsoft poached most of Inflection's employees, including its founder, Microsoft agreed to pay Inflection about $650 million to obtain its model license and compensate Inflection's investors.
Cohere, another AI unicorn ranked second only to Inflection, is also rumored to be in financing difficulties. The company has been seeking to raise $500 million at a valuation of $6 billion since December last year, but has not yet confirmed a deal, and its last round of financing was in June last year. According to the money-burning speed of the big model, these unicorns that cannot generate their own blood originally need new financing every six months or even every quarter to continue.
What is even more embarrassing is that these companies have not launched any large models that are obviously stronger than GPT-4. Although they are advertised as "leading in various ways", the actual experience is quite different. Moreover, these companies are basically not profitable, so it is not difficult to understand why they have become abandoned by capital and lack subsequent blood transfusions.
Zhu Xiaohu, managing partner of Jinshajiang, believes that the big model is a very poor business model. The problem is that there is no difference in technology, and each generation of technology, such as 3.5, may cost tens of millions of dollars, 4.0 may cost hundreds of millions of dollars, and 5.0 may cost billions of dollars. You have to invest money again for each generation of model, and your cash realization cycle may be two to three years, which is worse than a power plant.
For example, after investing in infrastructure, power plants basically do not need to invest a lot of money, but the large model requires more money to upgrade every two or three years, and the monetization cycle may be two or three years. To be honest, this business model is a very poor business model.
Therefore, both domestic and foreign AI investments have entered a stage where cost-benefit must be seriously considered. If AI has entered the final stage, the second and third places in the industry will become extremely worthless. At this stage, OpenAI's leadership position remains unbreakable, and start-ups may be on the "edge of death" at any time.
Last year's "palace fight" is still not over
Last November, Ilya, along with three other board members, forced the company's high-profile CEO Sam Altman to resign, but he later said he regretted his decision. The dispute reportedly centered on a disagreement over the direction of OpenAI: Ilya was frustrated that Altman was rushing to launch AI products at the expense of safety work. Altman returned to OpenAI just five days after being ousted, reasserted his control, and continued to push for increasingly powerful technology, worrying some of his critics. Ilya remained an employee of OpenAI, but he never returned to work.
Sam Altman (left) Ilya Sutskever (right)
On May 17 this year, a few hours after the company's co-founder and chief scientist Ilya Sutskever announced his resignation on Tuesday, Jan Leike, one of the leaders of its super alignment team, also posted on the social platform X to announce his resignation.
Jan Leike, the head of OpenAI Super Alignment, revealed the real reason for his resignation, as well as more inside information. First, the computing power was not enough, and the 20% promised to the Super Alignment team was short, causing the team to go against the current, but it was becoming increasingly difficult. Second, safety was not taken seriously, and the safety governance of AGI was not given the same priority as launching a "shiny product."
Jan Leike
Let me first explain what "alignment" means. Since the GPT model is generated by a black box mechanism, the content generated is random and has poor controllability, and it is inevitable that it will produce things that do not conform to human values. Therefore, they are building an automatic alignment researcher that can be comparable to human level, leaving the relevant work to the automatic system as much as possible, while ensuring that the behavior of the artificial intelligence system is consistent with human values and goals.
It's not just the alignment team that is leaving. Evan Morikawa, the former head of engineering at OpenAI who led the launch of ChatGPT, GPT-4, DALL·E, and APIs, also announced his departure. He will work on a new project with former Boston Dynamics senior roboticist Andy Barry and Deep Mind research scientists Pete Florence and Andy Zeng, who believe that "this is necessary to achieve AGI on a global scale."
Some netizens said that it sounds like OpenAI wants to continue to burn money for commercialization instead of ensuring the safety of the steps forward. They want to be tycoons, not heroes.
Of course, Altman will not stop attacking Jan Leike's long article. He soon posted a "counterattack": "I am very grateful to Jan Leike for his contribution to OpenAI's alignment research and safety culture, and I am very sad to see him leave. He is right, we still have a lot to do, and we promise to do it. I will post a longer article in the next few days." If nothing unexpected happens, the next few days will be like last year, with constant reversals of the short article. As for what the truth is, perhaps only a few people who are involved know, and the right and wrong are actually not very important to netizens.
What is important is that this "wave of resignations" shows that the internal conflicts since last year have not been eliminated, but hidden transfers have erupted. In the final analysis, the dispute between Ilya Sutskever and Sam Altman is mainly a conflict between technology fundamentalists and market radicals. Jan Leike said: "I believe that we should spend more bandwidth on preparing for the next generation of models, focusing on security, monitoring, adversarial robustness, super alignment, confidentiality, social impact and other related topics. These problems are difficult to solve, and I am worried that we are not on the right track yet."
But Sam Altman has always emphasized how AI will subvert the world and create amazing products, and is frantically accumulating chips and even building its own chip factories to accelerate the development of technology. I believe that the dispute between the two is something that almost all AI practitioners have to consider. Technology can certainly promote the development of social productivity, but it will also bring a series of security issues and risks.
Sam Altman may not really be profit-driven and only want to make money. What he said about paying attention to AI safety is not necessarily a lie. But as a manager, you have to consider the long-term operation of the company. It is best to take both into account. If you can't take both into account, you need to prioritize. At present, the computing power required for AI alignment is not small, and the AI performance after alignment has declined significantly. This may be one of the important reasons why Sam Altman cannot give the alignment team a satisfactory arrangement. Of course, the real situation may be more complicated, but if there is no sufficient commercial interest in the next era of technological productivity, everything will be empty talk.
If OpenAI completely abandons alignment or does not pay attention to it, it will be a huge risk for the future of GPT-5, and commercialization may be forced to come to an abrupt end. In short, we hope that a better and more open OpenAI will emerge, rather than throwing all the problems to society.
China's big model does not need to feel inferior
In the past month, domestic large-scale models have given the market the impression that they are catching up with the United States, and many highlights also come from start-ups. Dark Side of the Moon has extended the context length to 2 million tokens; Minimax's overseas chat app Talkie has a daily active user base close to Character.AI; Shengshu Technology, which released Vidu, proposed the U-ViT architecture earlier than Sora; the open source model DeepSeek has reduced the cost to 1 yuan per million token input while ensuring performance.
Some articles say that domestic AI is all about price and only foreign AI is about function, which is not objective. The demand for reasoning already exists. Since its release at the end of last year, nearly 17,000 small and medium-sized foreign trade merchants on Alibaba have ordered AI business assistants, released millions of products, and the search volume has increased by nearly 40%. ByteDance has connected the Doubao big model to Douyin, Feishu and other businesses, processing an average of 120 billion token texts per day, but has not announced details such as parameters, probably because it uses a similar recommendation algorithm as Tiktok. Baidu Wenxin's big model processes 250 billion tokens of text per day, and the average daily call volume is four times that of the end of last year. Tencent has used the Hunyuan big model in meetings, reading and game customer service; the click-through rate and transaction volume of AI-supported advertising services are also rising.
The open source big model Tongyi Qianwen has become popular abroad, and too many foreigners are discussing and using the (Tongyi Qianwen) Qwen big model.
Tongyi released the 110 billion parameter open source model Qwen1.5-110B, which surpassed Meta's Llama-3-70B model in benchmarks such as MMLU, TheoremQA, and GPQA. Qwen1.5-110B topped the Open LLM Leaderboard, an open source large model ranking launched by HuggingFace, once again proving that Tongyi's open source series has the strongest competitiveness in the industry.
Some people may question the running score, but Tongyi, which is fully functional and free for C-end users, is really attractive. At the same time, Tongyi Qianwen APP has been upgraded to "Tongyi APP", integrating full-stack capabilities such as text and graphics, intelligent coding, document parsing, audio and video understanding, and visual generation, aiming to become the user's "all-round AI assistant".
Final Thoughts
Whether it is OpenAI or domestic AI companies, they are inseparable from the operation of financial capital and industrial capital. It is not a good thing to be too repellent to commercialization, but there should also be a bottom line to balance safety and efficiency. Only in this way can science and technology create a better future. Major technological advances in history are often accompanied by financial bubbles, which is a natural part of the promotion of new technologies. It is not terrible that AI has bubbles. What is terrible is that the foundation of technology is not solid, it is difficult to solve various problems in reality, and it is difficult to land and become "all bubbles".
References:
OpenAI releases GPT-4o Source: Founder Park
GPT-4o first test source: DoNews
GPT-4o is good, but its biggest highlight is that it is free. Source: ZAKER
Is the bubble of large models coming? Source: NewNewThing
Source of the AI gap between Chinese and American giants: Unfinished research
OpenAI's team to protect humanity falls apart Source: Silicon Star Pro
All the veteran scientists of OpenAI have fled. Source: AI Frontline
Ilya's departure from OpenAI revealed Source: Quantum Bit
Tongyi Qianwen 2.5 officially released Source: Alibaba Cloud
This article comes from the WeChat public account "Chief Business Review" (ID: CHReview) , author: Zuojingguantian, and is authorized to be published by 36Kr.





