GPT-5比人类医生还会看X光片？！
最新研究显示，GPT-5对医学影像的推理和理解准确率分别比人类专家高出24.23%和29.40%。
来自埃默里大学医学院的研究团队把GPT-5和GPT-4o以及更小的GPT-5变体（GPT-5-mini、GPT-5-nano）进行了比较，分析它们在医疗领域处理多模态信息的能力。
通过一系列标准化测试发现GPT-5在所有测试中的表现都比其他模型好，尤其是在Med...

<div>GPT-5比人类医生还会看X光片？！最新研究显示，GPT-5对医学影像的推理和理解准确率分别比人类专家高出24.23%和29.40%。<img src="https://img.36krcdn.com/hsossms/20250815/v2_9abcf80d0d954ed8b6f41a6c766d9b24@46958_oswg192145oswg1080oswg339_img_000?x-oss-process=image/format,jpg/interlace,1">来自埃默里大学医学院的研究团队把GPT-5和GPT-4o以及更小的GPT-5变体（GPT-5-mini、GPT-5-nano）进行了比较，分析它们在医疗领域处理多模态信息的能力。<img src="https://img.36krcdn.com/hsossms/20250815/v2_2922ee0ef1b8454b82b70fa7c1a2b159@46958_oswg37993oswg1080oswg318_img_000?x-oss-process=image/format,jpg/interlace,1">通过一系列标准化测试发现GPT-5在所有测试中的表现都比其他模型好，尤其是在MedXpertQA的多模态测试中，它的推理和理解得分比GPT-4o分别提高了近30%和36%，甚至比人类医生还高。<img src="https://img.36krcdn.com/hsossms/20250815/v2_03ce046d0c7c4b2a812d2219f971d8e0@46958_oswg120190oswg1080oswg266_img_000?x-oss-process=image/format,jpg/interlace,1">AI看病历常见，可是比人类医生还会看就不常见了，所以GPT-5是怎么做到的？<h2>AI在多模态医学领域超越人类新手医生</h2>研究人员对GPT-5、GPT-4o以及GPT-5的mini和nano版本进行了系统测试。测试分为三类：纯文本的USMLE考试、多模态的MedXpertQA测试还有还有放射科的VQA-RAD，都是零样本设置，不依赖数据微调。USMLE是美国医师执照考试，有标准化的命题和严格的评分体系，是全球医学教育和人才评估的重要参考基准。该考试分为三个步骤：Step1主要考察基础医学知识，Step2聚焦临床应用知识，Step3侧重实践。在此次研究中，GPT-5在USMLE考试中全面超越GPT-4o，且平均得分领先于其他模型。<img src="https://img.36krcdn.com/hsossms/20250815/v2_c63a54e2d0bd4ef9b2384e5c8563256b@46958_oswg125946oswg1080oswg275_img_000?x-oss-process=image/format,jpg/interlace,1">MedXpertQA测试是一个用于评估模型专家级医学知识与高级推理能力的综合基准，有文本测试和多模态测试，共涵盖4460道题目，涉及17个医学专科和11个身体系统，其数据源自超20个美国医师执照考试、欧洲放射学委员会考试等权威内容。其中多模态的MedXpertQA测试利用它的MM子集展开，MM子集引入了带有多样化图像及丰富临床信息（病历、检查结果等）的专家级考试题。为增加难度，多模态子集的题目还扩充至5个选项，能更有效地评估模型在贴近真实场景下的医学诊断推理能力。依据之前的数据，GPT-5推理和理解得分比GPT-4o分别提高了近30%和36%。<img src="https://img.36krcdn.com/hsossms/20250815/v2_5de4a63439a746aa9dc3452b0daa86ea@46958_oswg125946oswg1080oswg275_img_000?x-oss-process=image/format,jpg/interlace,1">下图详细对比了未取得执照的人类专家与GPT-5系列模型及GPT-4o在MedXpertQA测试的文本子集（Text）和多模态子集（MM）中的表现，涵盖推理、理解及平均三个维度。<img src="https://img.36krcdn.com/hsossms/20250815/v2_f7db8a0f56c14394be9466f7faadd2c5@46958_oswg192145oswg1080oswg339_img_000?x-oss-process=image/format,jpg/interlace,1">在文本测试中，GPT-4o三项得分均低于人类专家，GPT-5-nano同样全面落后，GPT-5-mini 推理和平均得分略超人类专家，而GPT-5表现最优，得分大幅领先。在多模态测试中，GPT-4o推理和平均得分略低，GPT-5-nano整体与人类专家持平，GPT-5-mini大幅超越人类专家，GPT-5优势最为显著，推理超人类专家24%、理解得超人类专家29%，展现出强大的多模态医学推理能力。VQA-RAD测试是医学视觉问答测试，该数据集包含315张放射影像以及与之对应的3515个问答对。常用于评估医学多模态大语言模型解读复杂医学图像并生成准确文本描述的能力。在此次研究中，GPT-5的匹配率为70.92%，高于GPT-4o及小变体GPT-5-nano，而其轻量化变体GPT-5-mini的表现略优，严格匹配率达到74.90%。<img src="https://img.36krcdn.com/hsossms/20250815/v2_312b7128f9d5410b91a215e23dfe2b4c@46958_oswg120190oswg1080oswg266_img_000?x-oss-process=image/format,jpg/interlace,1">考虑到VQA-RAD规模相对较小且具有放射科专项属性，这种得分差异可能源于较小模型存在数据集特定的过拟合现象。看了这么多测试结果，那么GPT-5为什么能全面碾压前辈GPT-4o呢？<h2>GPT-5构建了端到端的多模态架构</h2>团队认为，GPT-5能力提升核心源于其跨模态注意力与对齐能力的增强。GPT-5与GPT-4o的核心差距，本质上是从文本主导的混合处理到原生多模态深度融合的代际跨越。GPT-4o在处理跨模态任务时，仍依赖文本转译+外部工具调用的间接模式：例如解析医学影像时，需先通过第三方模型将图像信息转化为文本描述，再基于文本进行推理。这种模态转换中介不仅增加了信息损耗（如图像中的细微病变可能在转译中被忽略），还导致推理链条断裂——模型难以直接建立影像特征-病理机制-治疗方案的因果关联。而GPT-5构建了端到端的多模态架构：通过共享标记化技术，将文本、影像、音频等信息编码为统一向量空间的符号，再借助跨模态注意力机制实现感知-推理-决策的无缝衔接。并且，团队认为在MedXpertQA Text、USMLE Step 2这样的推理密集型任务中，GPT-5的进步更突出是因为思维链提示与GPT-5增强的内部推理能力形成了协同效应，使其能更准确地完成多步推理。不过研究人员也指出，尽管GPT-5在标准测试中表现优秀，但要说明的是，这些测试都是在理想环境下进行的，题目和数据都是标准化的，现实中患者的情况千奇百怪，还可能遇到各种突发状况。所以，GPT-5要真走进诊室当助理，还得经过更多实战考验。这不，KCDH_A数字健康研究中心对AI进行了放射科的终极考试，这是一项AI从未见过的、跨模态的检测任务，涵盖了CT、MRI和X光，模拟日常实践中实际遇到的复杂真实病例。测试结果显示，所有AI模型得分均低于实习医生，而拥有执业资格的放射科医生比AI领先更多，虽然GPT-5刚刚进入顶尖AI的位置，但也远低于人类。<img src="https://img.36krcdn.com/hsossms/20250815/v2_93d3936186e2401e99ca5e9a28d28171@46958_oswg136218oswg1080oswg608_img_000?x-oss-process=image/format,jpg/interlace,1">该实验室的研究人员表示：<blockquote>虽然我对AI发展感到兴奋，我们实验室也在每天使用AI模型，但AI取代放射科医生与现实的差距仍然很大。</blockquote>由此可见，AI独自看病历之前，还是得先磨练磨练。论文地址：https://arxiv.org/abs/2508.08224参考链接：[1]https://x.com/omarsar0/status/1955252499142627788[2]https://x.com/emollick/status/1955381296743715241[3]https://x.com/DrDatta_AIIMS/status/1954586822849523789本文来自微信公众号<a rel="nofollow" href="https://mp.weixin.qq.com/s/gfM7kMxSt9Cs7Ark8gaOcA">“量子位”</a>，作者：闻乐，36氪经授权发布。</div>

GPT-5超越人类医生，推理能力比专家高出24%，理解力强29%

GPT-5 Better at Reading X-rays Than Human Doctors?!
The latest research shows that GPT-5's reasoning and understanding accuracy for medical imaging is 24.23% and 29.40% higher than human experts, respectively.
A research team from Emory University Medical School compared GPT-5 with GPT-4o and smaller GPT-5 variants (GPT-5-mini, GPT-5-nano), analyzing their ability to process multimodal information in the medical field.
Through a series of standardized tests, it was found that GPT-5 performed better than other models in all tests, especially in Med...

<div>Is GPT-5 Better at Reading X-rays Than Human Doctors?!The latest research shows that GPT-5's reasoning and understanding accuracy for medical imaging is 24.23% and 29.40% higher than human experts, respectively.<img src="https://img.36krcdn.com/hsossms/20250815/v2_9abcf80d0d954ed8b6f41a6c766d9b24@46958_oswg192145oswg1080oswg339_img_000?x-oss-process=image/format,jpg/interlace,1">The research team from Emory University School of Medicine compared GPT-5 with GPT-4o and smaller GPT-5 variants (GPT-5-mini, GPT-5-nano), analyzing their ability to process multimodal information in the medical field.<img src="https://img.36krcdn.com/hsossms/20250815/v2_2922ee0ef1b8454b82b70fa7c1a2b159@46958_oswg37993oswg1080oswg318_img_000?x-oss-process=image/format,jpg/interlace,1">Through a series of standardized tests, it was found that GPT-5 performed better than other models in all tests, especially in the MedXpertQA multimodal test, where its reasoning and understanding scores improved by nearly 30% and 36% compared to GPT-4o, even surpassing human doctors.<img src="https://img.36krcdn.com/hsossms/20250815/v2_03ce046d0c7c4b2a812d2219f971d8e0@46958_oswg120190oswg1080oswg266_img_000?x-oss-process=image/format,jpg/interlace,1">While AI reading medical records is common, AI being better at it than human doctors is not. So how did GPT-5 achieve this?<h2>AI Surpasses Junior Doctors in Multimodal Medical Field</h2>Researchers conducted systematic tests on GPT-5, GPT-4o, and GPT-5's mini and nano versions.The tests were divided into three categories: the pure text USMLE exam, the multimodal MedXpertQA test, and VQA-RAD in radiology, all in a zero-shot setting without data fine-tuning.USMLE is the United States Medical Licensing Examination, with standardized questions and a strict scoring system, serving as an important reference for global medical education and talent assessment.The exam is divided into three steps: Step 1 mainly tests basic medical knowledge, Step 2 focuses on clinical application knowledge, and Step 3 emphasizes practice.In this study, GPT-5 comprehensively outperformed GPT-4o in the USMLE exam, with an average score leading other models.<img src="https://img.36krcdn.com/hsossms/20250815/v2_c63a54e2d0bd4ef9b2384e5c8563256b@46958_oswg125946oswg1080oswg275_img_000?x-oss-process=image/format,jpg/interlace,1">[The rest of the translation continues in the same manner, maintaining the original structure and translating all non-tagged text to English.]</blockquote>It seems that before AI can independently review medical records, it still needs to practice and hone its skills.Paper address: https://arxiv.org/abs/2508.08224Reference links:[1]https://x.com/omarsar0/status/1955252499142627788[2]https://x.com/emollick/status/1955381296743715241[3]https://x.com/DrDatta_AIIMS/status/1954586822849523789This article is from the WeChat official account <a rel="nofollow" href="https://mp.weixin.qq.com/s/gfM7kMxSt9Cs7Ark8gaOcA">"Quantum Bit"</a>, author: Wen Le, published with authorization from 36kr.</div>

GPT-5 surpasses human doctors, with reasoning ability 24% higher than experts and comprehension 29% stronger

The release of this new regulation marks a historic leap in the attitude of Chinese regulators towards virtual assets, shifting from a "complete ban" to a "combination of restriction and guidance."

Article Author & Source: Guofeng Law Firm On February 6, 2026, the "Notice on Further Preventing and Handling Risks Related to Virtual Currencies" (Yinfa [2026] No. 42, hereinafter referred to as "Document No. 42"), jointly issued by eight ministries led by the People's Bank of China, and the "Regulatory Guidelines on the Issuance of Asset-Backed Securities Tokens Overseas by Domestic Assets" (hereinafter referred to as the "Guidelines") simultaneously issued by the China Securities Regulatory Commission, have triggered significant reactions in the legal and financial circles...

Breaking News! The Year of China's RWA: A Compliant Channel Opens for Trillions of Yuan in Domestic Assets to Go Global

BitMEX co-founder Arthur Hayes wrote that the recent Bitcoin crash may stem from traders' concerns about IBIT (BlackRock) […]
The article, "Arthur Hayes Speculates on BTC Crash Cause: 'Institutional Hedging Operations': IBIT Options Explode $900 Million," was originally published on BlockTempo, the most influential blockchain ABMedia media.

Arthur Hayes speculates that the reason for the BTC crash is "institutional hedging operations": IBIT options saw a surge of $900 million.

Unusual trading in BlackRock’s bitcoin ETF, iShares Bitcoin Trust (IBIT), has led traders to speculate that this week’s sharp Bitcoin drop may have been triggered by one or more Hong Kong–based hedge ...