ChatGPT, Claude, and Gemini can do quite a bit—but they still struggle to play video games, including the classic shooter Doom.

Despite the buzz surrounding artificial intelligence, even the most advanced vision-language models—GPT-4o, Claude Sonnet 3.7, and Gemini 2.5 Pro—struggle with a decades-old challenge: playing the classic first-person shooter Doom.On Thursday, a new research project introduced <a href="https://www.vgbench.com/" rel="nofollow">VideoGameBench</a>, an AI benchmark designed to test whether state-of-the-art vision-language models can play—and beat—a suite of 20 popular video games, using only what they see on the screen.“In our experience, current state-of-the-art VLMs substantially struggle to play video games because of high inference latency,” the researchers said. “When an agent takes a screenshot and queries the VLM about what action to take, by the time the response comes back, the game state has changed significantly and the action is no longer relevant.”The researchers stated that they used classic Game Boy and MS-DOS games due to their simpler visuals and diverse input styles, like a mouse and keyboard or game controller, which better test a vision-language model’s spatial reasoning capabilities than text-based games.VideoGameBench was developed by computer scientist and AI researcher Alex Zhang. The suite of games includes classics like Warcraft II, Age of Empires, and Prince of Persia.According to the researchers, delayed responses are most problematic in first-person shooters like Doom. In these fast-paced environments, an enemy visible in a screenshot may already have moved—or even reached the player—by the time the model acts.For software developers, Doom has long served as a litmus test for technological capability in gaming environments. <a href="https://decrypt.co/218679/can-robot-lawn-mowers-run-doom-yes" rel="nofollow">Lawnmowers</a>, <a href="https://decrypt.co/resources/how-play-classic-games-doom-bitcoin-dogecoin" rel="nofollow">Bitcoin</a>, and even human gut <a href="https://decrypt.co/215102/doom-runs-bacteria-dogecoin-now-gut-bacteria-too" rel="nofollow">bacteria</a> have faced down the demons from hell with varying levels of success. Now it’s AI’s turn.“What has brought Doom out of the shadows of the 90s and into the modern light is not its riveting gameplay, but rather its appealing computational design,” MIT biotech researcher Lauren Ramlan previously told Decrypt. “Built on the id Tech 1 engine, the game was designed to require only the most modest of setups to be played.”In addition to struggling with understanding game environments, the models often failed to perform basic in-game actions.“We observed frequent instances where the agent had trouble understanding how its actions—such as moving right—would translate on screen,” the researchers said. “The most consistent failure across all frontier models we tested was an inability to reliably control the mouse in games like Civilization and Warcraft II, where precise and frequent mouse movements are essential.”To better understand the limitations of current AI systems, VideoGameBench emphasized the importance of evaluating their reasoning abilities in environments that are both dynamic and complex.“Unlike extremely complicated domains like unsolved math proofs and olympiad-level math problems, playing video games is not a superhuman reasoning task, yet models still struggle to solve them,” they said.Edited by <a href="https://decrypt.co/author/andrew-hayward" rel="nofollow">Andrew Hayward</a>

Relax, You're Still Better at Playing 'Doom' Than AI

ChatGPT、Claude和Gemini可以做很多事情，但它們仍然難以玩影片遊戲，包括經典射擊遊戲Doom。

儘管人工智慧備受關注，但即使是最先進的視覺語言模型——GPT-4o、Claude Sonnet 3.7和Gemini 2.5 Pro，也難以應對這個幾十年的挑戰：玩經典第一人稱射擊遊戲毀滅戰士。週四，一個新的研究專案推出了<a href="https://www.vgbench.com/" rel="nofollow">VideoGameBench</a>，這是一個人工智慧基準測試，旨在測試最先進的視覺語言模型是否能僅憑螢幕上看到的內容來玩和擊敗20款流行影片遊戲。"根據我們的經驗，當前最先進的視覺語言模型在玩影片遊戲時嚴重受阻，因為推理延遲很高，"研究人員說。"當代理擷取螢幕截圖並詢問視覺語言模型應該採取什麼行動時，等到響應返回時，遊戲狀態已經發生了重大變化，行動不再相關。"研究人員表示，他們使用經典的Game Boy和MS-DOS遊戲，因為這些遊戲的視覺效果簡單，輸入方式多樣，如滑鼠和鍵盤或遊戲手柄，這比基於文字的遊戲更能測試視覺語言模型的空間推理能力。VideoGameBench由計算機科學家和人工智慧研究員Alex Zhang開發。遊戲套件包括魔獸爭霸II、帝國時代和波斯王子等經典遊戲。據研究人員稱，延遲響應在毀滅戰士等第一人稱射擊遊戲中最為棘手。在這些快節奏的環境中，螢幕截圖中可見的敵人可能已經移動，甚至已經接近玩家，而模型尚未採取行動。對於軟體開發人員來說，毀滅戰士長期以來一直是測試遊戲環境中技術能力的試金石。<a href="https://decrypt.co/218679/can-robot-lawn-mowers-run-doom-yes" rel="nofollow">割草機</a>、<a href="https://decrypt.co/resources/how-play-classic-games-doom-bitcoin-dogecoin" rel="nofollow">比特幣</a>，甚至人體腸道<a href="https://decrypt.co/215102/doom-runs-bacteria-dogecoin-now-gut-bacteria-too" rel="nofollow">細菌</a>都以不同程度的成功對抗地獄惡魔。現在輪到人工智慧了。"讓毀滅戰士從90年代的陰影中走向現代光明的，不是其引人入勝的遊戲性，而是其誘人的計算設計，"麻省理工學院生物技術研究員Lauren Ramlan此前對Decrypt表示。"建立在id Tech 1引擎上，該遊戲的設計只需最基本的設定就可以玩。"除了難以理解遊戲環境外，這些模型often未能執行基本的遊戲內操作。"我們觀察到代理經常難以理解其行動（如向右移動）在螢幕上的轉換，"研究人員說。"我們測試的所有前沿模型中最一致的失敗是無法可靠地控制文明和魔獸爭霸II等遊戲中的滑鼠，而在這些遊戲中，精確和頻繁的滑鼠移動至關重要。"為了更好地瞭解當前人工智慧系統的侷限性，VideoGameBench強調了在動態且複雜的環境中評估其推理能力的重要性。"與未解決的數學證明和奧林匹克級數學問題等極其複雜的領域不同，玩影片遊戲並不是超人的推理任務，但模型仍然難以解決它們，"他們說。編輯：<a href="https://decrypt.co/author/andrew-hayward" rel="nofollow">Andrew Hayward</a>

別緊張，你玩《毀滅戰士》還是比 AI 好

比特幣上週價格一度觸及 6 萬美元。在收益遞減模型下，這絕不是簡單的噪音。市場正在觸碰整個四年週期與對數增長框架中最脆弱的環節。
當比特幣週期頂部的漲幅已被大幅壓縮，如果再出現歷史級別的深度回調，其經典週期的吸引力將徹底失效。
這不是預測，這是數學規律。
週期頂部漲幅正在壓縮
比特幣各週期歷史頂部：
· 2013 年：~1,242 美元
· 2017 年：~19,700 美元
· 2021 年：~...

5.5萬美元，將是比特幣的生死線

加密市場今日下午出現急跌走勢，以太坊於台灣時間下午三點跌破 2,000 美元關口，在 Hyperliquid […]
〈麻吉大哥虧慘一夜沒睡？以太坊跌破2000鎂慌了， 開倉做多ETH、HYPE 割肉12萬鎂全輸光〉這篇文章最早發佈於動區BlockTempo《動區動趨-最具影響力的區塊鏈新聞媒體》。

麻吉大哥虧慘一夜沒睡？以太坊跌破2000鎂慌了， 開倉做多ETH、HYPE 割肉12萬鎂全輸光

隨著 FinChain 合作加速推進，將傳統資產大規模上鍊，亞洲機構對代幣化金融的需求正在加速成長。

摘要：FinChain 深化策略合作…