Skip to main content

Research Repository

Advanced Search

Assessing ChatGPT 4.0’s Capabilities in the United Kingdom Medical Licensing Examination (UKMLA): A Robust Categorical Analysis

Casals-Farre, Octavi; Baskaran, Ravanth; Singh, Aditya; Kaur, Harmeena; Ul Hoque, Tazim; de Almeida, Andreia; Coffey, Marcus; Hassoulas, Athanasios

Authors

Octavi Casals-Farre

Ravanth Baskaran

Aditya Singh

Harmeena Kaur

Tazim Ul Hoque

Andreia de Almeida

Marcus Coffey

Athanasios Hassoulas



Abstract

Advances in the various applications of artificial intelligence will have important implications for medical training and practice. The advances in ChatGPT-4 alongside the introduction of the medical licensing assessment (MLA) provide an opportunity to compare GPT-4’s medical competence against the expected level of a United Kingdom junior doctor and discuss its potential in clinical practice. Using 191 freely available questions in MLA style, we assessed GPT-4’s accuracy with and without offering multiple-choice options. We compared single and multi-step questions, which targeted different points in the clinical process, from diagnosis to management. A chi-squared test was used to assess statistical significance. GPT-4 scored 86.3% and 89.6% in papers one-and-two respectively. Without the multiple-choice options, GPT’s performance was 61.5% and 74.7% in papers one-and-two respectively. There was no significant difference between single and multistep questions, but GPT-4 answered ‘management’ questions significantly worse than ‘diagnosis’ questions with no multiple-choice options (p = 0.015). GPT-4’s accuracy across categories and question structures suggest that LLMs are competently able to process clinical scenarios but remain incapable of understanding these clinical scenarios. Large-Language-Models incorporated into practice alongside a trained practitioner may balance risk and benefit as the necessary robust testing on evolving tools is conducted.

Citation

Casals-Farre, O., Baskaran, R., Singh, A., Kaur, H., Ul Hoque, T., de Almeida, A., Coffey, M., & Hassoulas, A. (in press). Assessing ChatGPT 4.0’s Capabilities in the United Kingdom Medical Licensing Examination (UKMLA): A Robust Categorical Analysis. Scientific Reports, 15(1), Article 13031. https://doi.org/10.1038/s41598-025-97327-2

Journal Article Type Article
Acceptance Date Apr 3, 2025
Online Publication Date Apr 15, 2025
Deposit Date Apr 22, 2025
Journal Scientific Reports
Electronic ISSN 2045-2322
Publisher Nature Publishing Group
Peer Reviewed Peer Reviewed
Volume 15
Issue 1
Article Number 13031
DOI https://doi.org/10.1038/s41598-025-97327-2
Keywords Exam, ChatGPT, Medical student, Finals
Public URL https://keele-repository.worktribe.com/output/1198175
Publisher URL https://www.nature.com/articles/s41598-025-97327-2


Downloadable Citations