Can Large-language Models Derive Drug Main Effects and Side Effects with Inductive Reasoning from Chart Review?
Abstract number :
2.397
Submission category :
7. Anti-seizure Medications / 7D. Drug Side Effects
Year :
2024
Submission ID :
117
Source :
www.aesnet.org
Presentation date :
12/8/2024 12:00:00 AM
Published date :
Authors :
Presenting Author: Daniel Goldenholz, MD PhD – BIDMC
Sara Habib, MD – Harvard BIDMC
M Brandon Westover, MD, PhD – Harvard BIDMC
Rationale: The extraction of knowledge from electronic medical records remains challenging due to the non-standardized and unstructured nature of clinical notes. Modern large language models (LLMs), a form of artificial intelligence (AI), have shown potential in generating and summarizing information, but their application in inductive reasoning to derive general clinical insights from patient notes is not well-established. This study aims to investigate the capability of LLMs to accurately synthesize clinical facts from simulated noisy chart review, thereby demonstrating their potential for inductive reasoning in medical contexts.
Methods:
A simulated randomized trial for the anti-seizure medication cenobamate was created, including 120 patients given placebo and 120 patients given full-strength drug. The CHOCOLATES seizure diary simulator was used to generate realistic seizure counts. Clinical notes were generated using LLMs with varying writing styles to reflect real-world variability. For each patient 2 notes were generated: one clinical visit after the 2-month baseline period and a second visit after the 3-month treatment maintenance period. These notes included random extraneous details to simulate the noise present in actual clinical documentation. An AI pipeline consisting of multiple LLMs was used to summarize and synthesize the data from these notes. The efficacy and safety of cenobamate were evaluated by using manually constructed data tables and separately with AI-derived data tables. Both data tables were separately analyzed using standard statistical methods. The “true” table (i.e. the facts used to originally generate the notes for each patient) were also analyzed using standard statistical methods.
Results:
For efficacy, the AI and human analysis differed by 1% in both 50%-responder rate and median percentage change. For safety, all 14 side effect rates from AI and human analysis differed by up to 5%, usually 0-1%. Of note, this list of 14 symptoms was generated independently by the human and independently by the AI, it was not provided to them. When the human results deviated from the “true” result, this indicated “noise” from the generating LLM (a desirable feature of the simulation).
Conclusions:
This study highlights the potential of AI, specifically LLMs, to derive treatment effects and symptoms from unstructured noisy clinical notes. The results support the feasibility of using AI for inductive reasoning in medical chart reviews, with the potential to transform clinical practice by automating knowledge extraction and revealing unknown side effects through large-scale real-world deployment.
Funding: NIH K23NS124656
Anti-seizure Medications