An AI system performed better than junior doctors to diagnose patients with eye problems, according to a study published in the journal PLOS Digital Health. The authors from the University of Cambridge, UK, suggest that the clinical knowledge of GPT-4 approaches that of specialist eye doctors.
GPT-4 was tested against doctors at different stages in their careers, ranging from unspecialised junior doctors to expert eye doctors. GPT-4 – Generative Pre-trained Transformers – include datasets with hundreds of billions of words from articles, books, and other internet sources. This is the same system that powers ChatGPT, which can complete medical exams and provide more accurate messages than human doctors in response to patient queries.
For the study, each doctor was presented with 87 different scenarios and asked to to diagnose and treat patients with specific eye problems, including extreme light sensitivity, decreased vision, lesions, itchy and painful eyes. GPT-4 had to diagnose and treat the same patients.
GPT-4 scored better than junior doctors, about the same as trainee eye doctors, and marginally lower than eye specialists. The authors are clear to say that AI models are not going to replace healthcare professionals but have the potential to improve clinical workflow and improve the care given to patients. This system could be useful to provide advice, diagnosis and treatment suggestions in specific situations such as triaging patients or where access to healthcare is limited.
“We could realistically deploy AI in triaging patients with eye issues to decide which cases are emergencies that need to be seen by a specialist immediately, which can be seen by a GP, and which don’t need treatment,” said Dr Arun Thirunavukarasu, lead author of the study. “The models could follow clear algorithms already in use, and we’ve found that GPT-4 is as good as expert clinicians at processing eye symptoms and signs to answer more complicated questions.”
“Doctors aren’t revising for exams for their whole career. We wanted to see how AI fared when pitted against to the on-the-spot knowledge and abilities of practicing doctors, to provide a fair comparison,” said Thirunavukarasu. “We also need to characterise the capabilities and limitations of commercially available models, as patients may already be using them – rather than the internet – for advice.”
Thirunavukarasu AJ, Mahmood S, Malem A, Foster WP, Sanghera R, Hassan R, et al. (2024) Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study. PLOS Digit Health 3(4): e0000341. https://doi.org/10.1371/journal.pdig.0000341