Abstracts

Natural Language Processing for standardized data extraction from epilepsy notes

Abstract number : 2.171
Submission category : 4. Clinical Epilepsy / 4B. Clinical Diagnosis
Year : 2017
Submission ID : 349168
Source : www.aesnet.org
Presentation date : 12/3/2017 3:07:12 PM
Published date : Nov 20, 2017, 11:02 AM

Authors :
Pouya Khankhanian, University of Pennsylvania; Nikitha Kosaraju, University of Pennsylvania; Colin A. Ellis, University of Pennsylvania; Jay Pathmanathan, University of Pennsylvania; John R. Pollard, Christiana Care; Brian Litt, University of Pennsylvania

Rationale: Innovations in technology have significantly decreased the cost of genome sequencing over the years. However, the cost of ascertaining phenotypes continues to rise, and is rapidly becoming the rate-limiting step for making substantial research gains. In epilepsy, phenotyping is further complicated by the wide range of characteristics needed. Natural Language Processing (NLP) is the process by which a computer can extract information from the natural language of humans. Application of NLP to electronic medical records can cut down both costs and time involved in manual chart abstraction. Methods: A NLP algorithm was created to identify and pull out a series of high-yield variables specific to epilepsy research from patients' electronic medical records. The NLP algorithm is specific to these epilepsy-associated phenotypes, and was developed in parallel with human data abstraction. To assess the validity of the algorithm, we compared the NLP-extracted phenotypes with human-extracted phenotypes (agreed upon by two or more reviewers) for 100 independent samples. Results: Relatively high sensitivity and specificity (>90%) were found for many extracted phenotype variables including: EEG abnormality, identification of psychogenic non-epileptic spells (PNES) on EEG, current anti-epileptic drugs (AEDs), prior AEDs, AED allergies. Two variables, epilepsy syndrome and lesional MRI, had lower sensitivity (85%) with preserved specificity (90%). The specificity was very low (20-30%) with maintained reasonable sensitivity (>80%) in assessing the clinicians’ indication for long-term EEG (e.g. differential diagnosis, classification, quantification, pre-surgical evaluation). In summary, NLP was least sensitive for variables which required assessment of the clinicians’ thought process (which may or may not be documented in the clinic notes), with reasonable sensitivity for most other variables. Specificity was relatively high for all variables. Economic analysis revealed that for most variables, the total time cost required by NLP drops below that of human reviewers when 300 or more charts are reviewed, with multi-fold improvement in performance when 1000s of charts are reviewed. Conclusions: NLP can be a useful tool in epilepsy phenotyping. We demonstrate that there is a gradient of phenotype-related variables that can be assessed through NLP. Medications and structured information can be extracted with relative ease, while NLP algorithms are expected to have persistent issues extracting information from notes that require insight into a particular clinical thought process. With larger study sizes in genetic studies and precision medicine trials, NLP will be the most cost effective phenotyping method. Funding: The Thornton Foundation
Clinical Epilepsy