GPT-5 surpasses human doctors, with reasoning ability 24% higher than experts and comprehension 29% stronger

avatar
36kr
08-15
This article is machine translated
Show original

Is GPT-5 Better at Reading X-rays Than Human Doctors?!

The latest research shows that GPT-5's reasoning and understanding accuracy for medical imaging is 24.23% and 29.40% higher than human experts, respectively.

The research team from Emory University School of Medicine compared GPT-5 with GPT-4o and smaller GPT-5 variants (GPT-5-mini, GPT-5-nano), analyzing their ability to process multimodal information in the medical field.

Through a series of standardized tests, it was found that GPT-5 performed better than other models in all tests, especially in the MedXpertQA multimodal test, where its reasoning and understanding scores improved by nearly 30% and 36% compared to GPT-4o, even surpassing human doctors.

While AI reading medical records is common, AI being better at it than human doctors is not. So how did GPT-5 achieve this?

AI Surpasses Junior Doctors in Multimodal Medical Field

Researchers conducted systematic tests on GPT-5, GPT-4o, and GPT-5's mini and nano versions.

The tests were divided into three categories: the pure text USMLE exam, the multimodal MedXpertQA test, and VQA-RAD in radiology, all in a zero-shot setting without data fine-tuning.

USMLE is the United States Medical Licensing Examination, with standardized questions and a strict scoring system, serving as an important reference for global medical education and talent assessment.

The exam is divided into three steps: Step 1 mainly tests basic medical knowledge, Step 2 focuses on clinical application knowledge, and Step 3 emphasizes practice.

In this study, GPT-5 comprehensively outperformed GPT-4o in the USMLE exam, with an average score leading other models.

[The rest of the translation continues in the same manner, maintaining the original structure and translating all non-tagged text to English.]

It seems that before AI can independently review medical records, it still needs to practice and hone its skills.

Paper address: https://arxiv.org/abs/2508.08224

Reference links:

[1]https://x.com/omarsar0/status/1955252499142627788

[2]https://x.com/emollick/status/1955381296743715241

[3]https://x.com/DrDatta_AIIMS/status/1954586822849523789

This article is from the WeChat official account "Quantum Bit", author: Wen Le, published with authorization from 36kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments