The tech giant's "thinking model" outperforms rivals on complex benchmarks and is now available to all users for free

Google's recently launched Gemini 2.5 Pro has risen to the top spot on coding leaderboards, beating Claude in the famous <a href="https://webdev.lmarena.ai/leaderboard" rel="nofollow">WebDev Arena</a>—a non-denominational ranking site akin to the <a href="https://lmarena.ai/" rel="nofollow">LLM arena</a>, but focused specifically on measuring how good AI models are at coding. The achievement comes amid Google's push to position its flagship AI model as a leader in both coding and reasoning tasks.Released earlier this year Gemini 2.5 Pro <a href="https://web.lmarena.ai/leaderboard" rel="nofollow">ranks first</a> across several categories, including coding, style control, and creative writing. The model's massive context window—one million tokens expanding to <a href="https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/" rel="nofollow">two million</a> soon—allows it to handle large codebases and complex projects that would choke even the closest competitors. For context, powerful models like ChatGPT and Claude 3.7 Sonnet can only handle up to 128K tokens.Gemini also has the highest “IQ” of all AI models. TrackingAI put it through formalized <a href="https://trackingai.org/IQ-Test-Viewer" rel="nofollow">MENSA tests</a>, using verbalized questions from Mensa Norway to create a standardized way to compare AI models.Gemini 2.5 Pro scored higher than competitors on these tests, even when using bespoke questions not publicly available in training data.With an IQ score of 115 in offline tests, the new Gemini ranks among the “<a href="https://www.verywellmind.com/what-is-a-genius-iq-score-2795585" rel="nofollow">bright minded</a>,” with the average human intelligence scoring around 85 to 114 points. But the notion of an AI having IQ needs unpacking. AI systems don't have intelligence quotients like humans do, so it’s better to think of the benchmark as a metaphor for performance on reasoning benchmarks.For benchmarks specifically designed for AI, Gemini 2.5 Pro scored 86.7% on the AIME 2025 math test and 84.0% on the GPQA science assessment. On Humanity's Last Exam (HLE), a newer and harder benchmark created to avoid test saturation problems, Gemini 2.5 scored 18.8%, beating OpenAI's o3 mini (14%) and Claude 3.7 Sonnet (8.9%) which is remarkable in terms of the performance boost..The new version of Gemini 2.5 Pro is now available for free (with rate limits) to all Gemini users. Google previously described this release as an "experimental version of 2.5 Pro," part of its family of "thinking models" designed to reason through responses rather than simply generate text.Despite not winning every benchmark, Gemini has <a href="https://www.youtube.com/watch?v=mZNLegBg8BA" rel="nofollow">caught developers' attention</a> with its versatility. The model can create complex applications from single prompts, building interactive web apps, endless runner games, and visual simulations without requiring detailed instructions.We tested the model asking it to fix a broken HTML5 code. It generated almost 1000 lines of code, providing results that beat Claude 3.7 Sonnet—the previous leader—in terms of quality and understanding of the full set of instructions.For working developers, Gemini 2.5 Pro's input costs $2.50 per million tokens and output costs $15.00 per million tokens, positioning it as a cheaper alternative to some competitors while still offering impressive capabilities.The AI model handles up to 30,000 lines of code in its Advanced plan, making it suitable for enterprise-level projects. Its multimodal abilities—working with text, code, <a href="https://blog.google/products/gemini/gemini-collaboration-features/" rel="nofollow">audio</a>, <a href="https://decrypt.co/310943/google-gemini-2-flash-edit-photos-words-review" rel="nofollow">images</a>, and <a href="https://blog.google/technology/google-labs/video-image-generation-update-december-2024/" rel="nofollow">video</a>—add flexibility that other coding-focused models can't match.

Google's Gemini 2.5 Pro Tops Coding Charts and MENSA Tests in AI ‘IQ’ Battle

這家科技巨頭的"思考模型"在複雜的基準測試中表現優異，現在已免費向所有使用者開放

谷歌最近推出的Gemini 2.5 Pro在程式設計排行榜上躍居榜首，在著名的<a href="https://webdev.lmarena.ai/leaderboard" rel="nofollow">WebDev Arena</a>中擊敗了Claude——這是一個類似於<a href="https://lmarena.ai/" rel="nofollow">LLM競技場</a>的非宗派排名網站，但專注于衡量人工智慧模型在程式設計方面的表現。這一成就是在谷歌致力於將其旗艦人工智慧模型定位為程式設計和推理任務領導者的背景下實現的。今年早些時候釋出的Gemini 2.5 Pro在<a href="https://web.lmarena.ai/leaderboard" rel="nofollow">多個類別中排名第一</a>，包括程式設計、風格控制和創意寫作。該模型的海量上下文視窗——一百萬個代幣，很快將擴充套件到<a href="https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/" rel="nofollow">兩百萬個</a>——使其能夠處理大型程式碼庫和複雜專案，即使是最接近的競爭對手也會被這些專案難住。作為參考，像ChatGPT和Claude 3.7 Sonnet這樣強大的模型只能處理多達12.8萬個代幣。Gemini還擁有所有人工智慧模型中最高的"智商"。TrackingAI透過使用Mensa挪威分部的口頭問題，進行了標準化的<a href="https://trackingai.org/IQ-Test-Viewer" rel="nofollow">門薩測試</a>，建立了一種比較人工智慧模型的標準方法。Gemini 2.5 Pro在這些測試中得分高於競爭對手，即使使用的是訓練資料中未公開的定製問題。在離線測試中，智商得分為115，新版Gemini屬於"聰明人"行列，而普通人類智力得分在85到114分之間。但是，人工智慧擁有智商的概念需要進一步解釋。人工智慧系統沒有像人類那樣的智商，因此不如將這個基準視為推理基準效能的比喻。在專門為人工智慧設計的基準測試中，Gemini 2.5 Pro在2025年AIME數學測試中得分86.7%，在GPQA科學評估中得分84.0%。在一個名為人類最後考試（HLE）的更新、更難的基準測試中，該模型得分18.8%，超過了OpenAI的o3 mini（14%）和Claude 3.7 Sonnet（8.9%），在效能提升方面令人矚目。Gemini 2.5 Pro的新版本現已免費提供（有速率限制）給所有Gemini使用者。谷歌此前將此版本描述為2.5 Pro的"實驗版本"，是其"思考模型"系列的一部分，旨在透過推理來生成響應，而不僅僅是生成文字。儘管並非在每個基準測試中都獲勝，但Gemini憑藉其多功能性<a href="https://www.youtube.com/watch?v=mZNLegBg8BA" rel="nofollow">引起了開發者的注意</a>。該模型可以從單個提示中建立複雜的應用程式，構建互動式網頁應用、無盡跑者遊戲和視覺模擬，而無需詳細的指令。我們測試了該模型，要求它修復一段損壞的HTML5程式碼。它生成了近1000行程式碼，在程式碼質量和對全套指令的理解方面，結果超過了之前的領先者Claude 3.7 Sonnet。對於從事開發工作的人員來說，Gemini 2.5 Pro的輸入成本為每百萬代幣2.50美元，輸出成本為每百萬代幣15.00美元，定位為比一些競爭對手更便宜的選擇，同時仍提供令人印象深刻的功能。在高階版中，該人工智慧模型可處理多達3萬行程式碼，適合企業級專案。其多模態能力——可處理文字、程式碼、<a href="https://blog.google/products/gemini/gemini-collaboration-features/" rel="nofollow">音訊</a>、<a href="https://decrypt.co/310943/google-gemini-2-flash-edit-photos-words-review" rel="nofollow">影象</a>和<a href="https://blog.google/technology/google-labs/video-image-generation-update-december-2024/" rel="nofollow">影片</a>——增加了其他以程式設計為重點的模型無法匹敵的靈活性。

谷歌Gemini 2.5 Pro 在人工智能“智商”大賽中榮登編碼排行榜和門薩測試榜首

世界自由金融公司（World Liberty Financial）的目標是在全球數兆美元的外匯市場——全球最大、流動性最強的金融領域——分一杯羹。

這家與川普政府有關的加密貨幣公司…

川普支持的「世界自由」計畫推出「世界互換」外匯平台

2026 年投資者面臨的問題已不是 「要不要配置」，而是「配多少，以及通過什麼工具配置 」。

ARK Invest：比特幣的機構化之路

作者： The Economist
編譯： 深潮 TechFlow
---------------------------------------------------------------
深潮導讀： 儘管比特幣價格仍處於 7 萬美元上方，但加密市場正經歷著一場前所未有的“孤獨寒冬”。本文深入剖析了本輪跌勢與以往的不同之處：槓桿清算的連鎖反應、曾被寄予厚望的 ETF 如今淪為砸盤推手，以及最...