Authors :
Presenting Author: Asala Erekat, PhD – Icahn School of Medicine at Mount Sinai
Mark Dakov, BA – Icahn School of Medicine at Mount Sinai
Ilana Lefkovitz, MD – Cleveland Clinic
Megan Mackenzie, MD – Icahn School of Medicine at Mount Sinai
Stefano Malerba, MD – Icahn School of Medicine at Mount Sinai
Ho Wing "Andy" Chan, MD – Icahn School of Medicine at Mount Sinai
Alec Gleason, BA – Albert Einstein College of Medicine
Felix Richter, MD, PhD – Icahn School of Medicine at Mount Sinai
Madeline Fields, MD – Icahn School of Medicine at Mount Sinai
Nathalie Jette, MD, MSc – Cumming School of Medicine at the University of Calgary
Benjamin Kummer, MD – Icahn School of Medicine at Mount Sinai
Rationale:
In the absence of EEG, distinguishing epileptic tonic-clonic (TC) seizures from psychogenic non-epileptic events (PNEE) with motor manifestations is a diagnostic challenge, particularly in settings that lack EEG or neurological expertise. To accelerate diagnosis of PNEE, we sought to develop a computer vision algorithm to distinguish motor PNEE from epileptic TC seizures using only video data. As an intermediate step to building this algorithm, we built a video data pipeline using DeepLabCut (DLC), an open-source software that uses deep learning to estimate body pose in videos.Methods:
We identified adult inpatients who had continuous video EEG monitoring at the Icahn School of Medicine at Mount Sinai, an urban tertiary hospital system, between January 2012 and June 2021. We manually reviewed all reports in our EEG database to identify patients who had a captured generalized TC, focal-to-bilateral TC, or motor PNEE while connected to EEG. We included multiple events per patient if this occurred. For each event, we extracted the event video and 2 length-matched “normal state” (asleep and awake) videos, if available. Each event video contained a 2-minute “buffer” of footage preceding and following the event start and end. We used K-means clustering to select from each video a representative set of 20 frames, which were labeled by 3 neurologists (26 body parts; nose, chin, and left/right eyes, shoulders, elbows, wrists, fingers, hips, knees, ankles). We split 95% and 5% of frames into training and test sets respectively and trained the DLC pose estimator model (a ResNet-50 convolutional neural net) to learn the labeled body part positions from the non-labeled frames in each video. We measured performance of ResNet-50 by determining L2 pixel error (L2PE), mean average precision (MAP), and mean average recall (MAR) for both the training and test sets. Results:
We identified 135 patients, of whom 79 (58.5%) were female. We collected 200 videos, with a mean of 2.7 videos per patient and mean video length of 5.2 minutes. Of all videos, 71 (35.5%) were of seizure-like events (44 generalized/focal bilateral TC seizures and 27 motor PNEE). The model performed better on the training set (L2PE 23.0 pixels, MAP 87.7%, MAR 91.6%) than the test set (L2PE 53.0 pixels, MAP 76.8%, MAR 81.6%).Conclusions:
Our deep learning model performed moderately well in detecting pose from videos of seizure-like events. This pipeline can be used to develop classification algorithms with high clinical impacts in settings without access to EEG or neurological expertise, and potentially outside of seizure-like events. Funding:
CTSA grant UL1TR004419 (Kummer).