Abstracts

Can AI Pass Epilepsy Board Exams?

Abstract number : 1.471
Submission category : 2. Translational Research / 2B. Devices, Technologies, Stem Cells
Year : 2023
Submission ID : 1273
Source : www.aesnet.org
Presentation date : 12/2/2023 12:00:00 AM
Published date :

Authors :
Presenting Author: Haroon Butt, MD – Beth Israel Deaconess Medical Center

Sara Habib, MD – Fellow, Epilepsy, Beth Israel Deaconess Medical Center; Daniel Goldenholz, MD, PhD, FAES – Faculty, Epilepsy, Beth Israel Deaconess Medical Center

Rationale: Artificial Intelligence (AI) passed formal testing with examinations in different academic fields including medicine (USMLE exam) and law (bar exam). We used AES epilepsy board practice examination to evaluate and compare the competency of three highly capable large language models (LLMs): GPT-4, Bard and Claude. We hypothesized that these AI will be able to pass the subspecialty examination despite not being formally trained for that purpose. 

Methods: We used five tests from the epilepsy board practice question bank provided by American Epilepsy Society (AES). These were all multiple-choice questions. Each test was comprised of 100 questions containing texts, images or both. Each test was administered to each LLM. We also used a random number generator to select random questions from each test and LLM were asked to explain the reasoning behind the chosen answer being correct and other options being incorrect.  

Results: The test scores on the five tests; GPT-4: 64%, 77%, 70%, 70% and 76%, Bard: 54%, 68%, 65%, 61% and 62%, Claude: 75%, 69%, 70%, 68% and 68%. In response to certain questions, sometimes the LLM responded with not having enough information to be able to answer the questions. These were considered wrong and marked as incorrect responses. When asked to explain the correct and incorrect options, when the LLM had picked the correct answer, it was able to justify the answer with accurate evidence however when the incorrect answer was picked, the reasoning was based on inaccurate statements about guidelines (a.k.a. hallucinations).  

Conclusions: All three LLMs were able to obtain close to or above 70% on all the exams. This demonstrates that some LLMs have the capability of passing the subspecialty epilepsy board exams, without having been explicitly trained to do so. It is currently difficult to distinguish when the LLM is providing highly accurate information about epilepsy versus hallucinated knowledge. Further research is needed to determine what are the safety implications of this apparent expertise, as patients and clinicians begin to use these tools in their everyday life.

Funding: This research was funded in part by NINDS K23

Translational Research