The tech giant's "thinking model" outperforms rivals on complex benchmarks and is now available to all users for free

Google's recently launched Gemini 2.5 Pro has risen to the top spot on coding leaderboards, beating Claude in the famous <a href="https://webdev.lmarena.ai/leaderboard" rel="nofollow">WebDev Arena</a>—a non-denominational ranking site akin to the <a href="https://lmarena.ai/" rel="nofollow">LLM arena</a>, but focused specifically on measuring how good AI models are at coding. The achievement comes amid Google's push to position its flagship AI model as a leader in both coding and reasoning tasks.Released earlier this year Gemini 2.5 Pro <a href="https://web.lmarena.ai/leaderboard" rel="nofollow">ranks first</a> across several categories, including coding, style control, and creative writing. The model's massive context window—one million tokens expanding to <a href="https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/" rel="nofollow">two million</a> soon—allows it to handle large codebases and complex projects that would choke even the closest competitors. For context, powerful models like ChatGPT and Claude 3.7 Sonnet can only handle up to 128K tokens.Gemini also has the highest “IQ” of all AI models. TrackingAI put it through formalized <a href="https://trackingai.org/IQ-Test-Viewer" rel="nofollow">MENSA tests</a>, using verbalized questions from Mensa Norway to create a standardized way to compare AI models.Gemini 2.5 Pro scored higher than competitors on these tests, even when using bespoke questions not publicly available in training data.With an IQ score of 115 in offline tests, the new Gemini ranks among the “<a href="https://www.verywellmind.com/what-is-a-genius-iq-score-2795585" rel="nofollow">bright minded</a>,” with the average human intelligence scoring around 85 to 114 points. But the notion of an AI having IQ needs unpacking. AI systems don't have intelligence quotients like humans do, so it’s better to think of the benchmark as a metaphor for performance on reasoning benchmarks.For benchmarks specifically designed for AI, Gemini 2.5 Pro scored 86.7% on the AIME 2025 math test and 84.0% on the GPQA science assessment. On Humanity's Last Exam (HLE), a newer and harder benchmark created to avoid test saturation problems, Gemini 2.5 scored 18.8%, beating OpenAI's o3 mini (14%) and Claude 3.7 Sonnet (8.9%) which is remarkable in terms of the performance boost..The new version of Gemini 2.5 Pro is now available for free (with rate limits) to all Gemini users. Google previously described this release as an "experimental version of 2.5 Pro," part of its family of "thinking models" designed to reason through responses rather than simply generate text.Despite not winning every benchmark, Gemini has <a href="https://www.youtube.com/watch?v=mZNLegBg8BA" rel="nofollow">caught developers' attention</a> with its versatility. The model can create complex applications from single prompts, building interactive web apps, endless runner games, and visual simulations without requiring detailed instructions.We tested the model asking it to fix a broken HTML5 code. It generated almost 1000 lines of code, providing results that beat Claude 3.7 Sonnet—the previous leader—in terms of quality and understanding of the full set of instructions.For working developers, Gemini 2.5 Pro's input costs $2.50 per million tokens and output costs $15.00 per million tokens, positioning it as a cheaper alternative to some competitors while still offering impressive capabilities.The AI model handles up to 30,000 lines of code in its Advanced plan, making it suitable for enterprise-level projects. Its multimodal abilities—working with text, code, <a href="https://blog.google/products/gemini/gemini-collaboration-features/" rel="nofollow">audio</a>, <a href="https://decrypt.co/310943/google-gemini-2-flash-edit-photos-words-review" rel="nofollow">images</a>, and <a href="https://blog.google/technology/google-labs/video-image-generation-update-december-2024/" rel="nofollow">video</a>—add flexibility that other coding-focused models can't match.

Google's Gemini 2.5 Pro Tops Coding Charts and MENSA Tests in AI ‘IQ’ Battle

这家科技巨头的"思考模型"在复杂的基准测试中表现优异，现在已免费向所有用户开放

谷歌最近推出的Gemini 2.5 Pro在编程排行榜上跃居榜首，在著名的<a href="https://webdev.lmarena.ai/leaderboard" rel="nofollow">WebDev Arena</a>中击败了Claude——这是一个类似于<a href="https://lmarena.ai/" rel="nofollow">LLM竞技场</a>的非宗派排名网站，但专注于衡量人工智能模型在编程方面的表现。这一成就是在谷歌致力于将其旗舰人工智能模型定位为编程和推理任务领导者的背景下实现的。今年早些时候发布的Gemini 2.5 Pro在<a href="https://web.lmarena.ai/leaderboard" rel="nofollow">多个类别中排名第一</a>，包括编程、风格控制和创意写作。该模型的海量上下文窗口——一百万个代币，很快将扩展到<a href="https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/" rel="nofollow">两百万个</a>——使其能够处理大型代码库和复杂项目，即使是最接近的竞争对手也会被这些项目难住。作为参考，像ChatGPT和Claude 3.7 Sonnet这样强大的模型只能处理多达12.8万个代币。Gemini还拥有所有人工智能模型中最高的"智商"。TrackingAI通过使用Mensa挪威分部的口头问题，进行了标准化的<a href="https://trackingai.org/IQ-Test-Viewer" rel="nofollow">门萨测试</a>，创建了一种比较人工智能模型的标准方法。Gemini 2.5 Pro在这些测试中得分高于竞争对手，即使使用的是训练数据中未公开的定制问题。在离线测试中，智商得分为115，新版Gemini属于"聪明人"行列，而普通人类智力得分在85到114分之间。但是，人工智能拥有智商的概念需要进一步解释。人工智能系统没有像人类那样的智商，因此不如将这个基准视为推理基准性能的比喻。在专门为人工智能设计的基准测试中，Gemini 2.5 Pro在2025年AIME数学测试中得分86.7%，在GPQA科学评估中得分84.0%。在一个名为人类最后考试（HLE）的更新、更难的基准测试中，该模型得分18.8%，超过了OpenAI的o3 mini（14%）和Claude 3.7 Sonnet（8.9%），在性能提升方面令人瞩目。Gemini 2.5 Pro的新版本现已免费提供（有速率限制）给所有Gemini用户。谷歌此前将此版本描述为2.5 Pro的"实验版本"，是其"思考模型"系列的一部分，旨在通过推理来生成响应，而不仅仅是生成文本。尽管并非在每个基准测试中都获胜，但Gemini凭借其多功能性<a href="https://www.youtube.com/watch?v=mZNLegBg8BA" rel="nofollow">引起了开发者的注意</a>。该模型可以从单个提示中创建复杂的应用程序，构建交互式网页应用、无尽跑者游戏和视觉模拟，而无需详细的指令。我们测试了该模型，要求它修复一段损坏的HTML5代码。它生成了近1000行代码，在代码质量和对全套指令的理解方面，结果超过了之前的领先者Claude 3.7 Sonnet。对于从事开发工作的人员来说，Gemini 2.5 Pro的输入成本为每百万代币2.50美元，输出成本为每百万代币15.00美元，定位为比一些竞争对手更便宜的选择，同时仍提供令人印象深刻的功能。在高级版中，该人工智能模型可处理多达3万行代码，适合企业级项目。其多模态能力——可处理文本、代码、<a href="https://blog.google/products/gemini/gemini-collaboration-features/" rel="nofollow">音频</a>、<a href="https://decrypt.co/310943/google-gemini-2-flash-edit-photos-words-review" rel="nofollow">图像</a>和<a href="https://blog.google/technology/google-labs/video-image-generation-update-december-2024/" rel="nofollow">视频</a>——增加了其他以编程为重点的模型无法匹敌的灵活性。

谷歌Gemini 2.5 Pro 在人工智能“智商”大赛中荣登编码排行榜和门萨测试榜首

贝莱德亚太区iShares主管尼古拉斯·皮奇表示，即使在亚洲，对加密货币进行适度的投资组合配置也可能推动大量资金流入市场。

他在Consensus大会的一个小组讨论会上发表了上述言论……

贝莱德高管表示，亚洲地区1%的加密货币配置可释放2万亿美元的新资金流入。

Berachain 的原生代币 $BERA 在 2 月 11 日飙升超过 150%，创下数月以来单日最大涨幅。此前，该项目在 2025 年的大部分时间里都处于低迷状态，而此次上涨行情是在几周的复苏之后出现的。

战略转型提振 BERA，Berachain 飙升 150%。

贝莱德与Uniswap合作，将代币化债券基金引入DeFi领域，推动UNI价格飙升。图片来源：Diaro
贝莱德持续扩展其在去中心化金融（DeFi）领域的布局，此举为……铺平了道路。