Google Gemini's experimental model returns to the top of the competitive list, while the new version of GPT-4o leads for only one day

avatar
36kr
a day ago
This article is machine translated
Show original
Google and OpenAI are at it again. Just a day after the latest version of GPT-4o topped the leaderboard, they have released their newest experimental model, Gemini-Exp-1121, to reclaim the champion's throne. It's worth noting that the previous version, Gemini-Exp-1114, was only released a week ago. It seems they were prepared for OpenAI's challenge and deliberately held back a trump card.

This is a way to lure OpenAI into releasing a new version of GPT-4o, and then strike back with an even better model.

Google's researchers seem quite pleased. Jack Rae, Chief Scientist at Google DeepMind, called it an interesting "blitzkrieg" post-training, hinting that the iteration speed post-training is much faster than pre-training. Oriol Vinyals, Vice President of Research at DeepMind, also remotely asked Ultraman: "Any new submission plans lately?" The tension is palpable, and the confidence is high. So, how powerful is "1121"? Let's take a look at the specific performance. **Improved Code/Reasoning/Visual Understanding** According to the official information, Gemni-Exp-1121 has focused on improving three aspects of performance. Significantly improved code capability Stronger reasoning ability Stronger visual understanding ability Except for style control, it currently ranks first in all other areas. In terms of visual capabilities, Gemini-Exp-1121 has improved over the previous version. In the area of complex prompts with style control, Gemini-Exp-1121 is on par with o1-preview and New Sonnet 3.5. The actual win rate in the competition arena is as follows. You can now try it out directly. For example, let Gemini-Exp-1121 and the latest GPT-4o-1120 provide their understanding of the same cartoon. Gemini-Exp-1121's response is more comprehensive and detailed, using subheadings and highlighting key points. The new version of 4o's response is relatively short and general. In the classic logic puzzle of the farmer crossing the river with a wolf, a sheep, and a cabbage, Gemini-Exp-1121 answered completely correctly, while the new version of 4o made a mistake by combining the third and fourth crossings.

Question: The farmer needs to take the wolf, the sheep, and the cabbage across the river, but can only carry one item at a time, and the wolf and the sheep cannot be left alone, nor can the sheep and the cabbage. How should the farmer cross the river?

**One More Thing** It's worth mentioning that there is also new news from OpenAI. Someone has discovered code for a "Live Camera" video feature in the latest test version of ChatGPT. It includes real-time recording, real-time processing, voice mode integration, and visual recognition capabilities. Some users have experienced this capability when the advanced voice mode was launched. This means that OpenAI has prepared to launch this feature. On the other hand, Google has also demonstrated a similar demo, but it has not been launched yet. Given OpenAI's style, they may likely roll out the feature before Google. Perhaps next year, the primary mode of interaction with chatbots will shift from text-based dialogue to voice and agent. Live Camera may be the beginning of this transition. What do you think? References: [1]https://x.com/OfficialLoganK/status/1859667244688736419 [2]https://x.com/adonis_singh/status/1859682100569571399 [3]https://x.com/OriolVinyalsML/status/1859730969600852222 [4]https://x.com/rowancheung/status/1859301345993556277 This article is from the WeChat public account "Quantum" and is authorized for reposting by 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments