Abstracts

Quality Assurance Methods for Handling Multimodal EEG and Wearable Sensor Data

Abstract number : 2.135
Submission category : 3. Neurophysiology / 3C. Other Clinical EEG
Year : 2024
Submission ID : 777
Source : www.aesnet.org
Presentation date : 12/8/2024 12:00:00 AM
Published date :

Authors :
Presenting Author: Doroteja Dragovic, MS – Boston Children's Hospital, 300 Longwood Ave, Boston, MA 02115, USA

Navaneethakrishna Makaram, PhD, MS – Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
Edeline Jean Baptiste, BS – Boston Children's Hospital, 300 Longwood Ave, Boston, MA 02115, USA
Michele Jackson, BA – Boston Childrens Hospital
Tanuj Hasija, PhD, Msc. – Paderborn University
Stephanie Dailey, BA – Boston Children's Hospital, 300 Longwood Ave, Boston, MA 02115, USA
Saeid Sadeghian, MD – Boston Children's Hospital, 300 Longwood Ave, Boston, MA 02115, USA
Solveig Vieluf, PhD – LMU University Hospital, LMU Munich
Eleonora Tamilia, PhD – Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
William Bosl, PhD – Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA, Clinical Neuroinformatics & AI Laboratory, The Data Institute, University of San Francisco, San Francisco, CA, USA
Tobias Loddenkemper, MD – Boston Children's Hospital, 300 Longwood Ave, Boston, MA 02115, USA

Rationale: A comprehensive and accurate long-term continuous electroencephalogram (EEG) and wearable data set is essential to advance machine learning models for seizure detection and prediction. We present an automated, reproducible data cleaning system for EEG files to enable machine learning analyses.


Methods: We designed an automated data cleaning system to create an accurate and complete set of continuous EEG data collected from patients undergoing long-term video-EEG monitoring (LTM) and wearing wearable sensors (Empatica, Milan, Italy) from 2015 to 2021 through prospective enrollment in the Detect, Predict, and Prevent Epilepsy Cohort at Boston Children's Hospital. Continuous EEG recording files were extracted from Natus Neuroworks EEG systems in the Natus Datashare Format (EEGF) and European Data Format (EDF) and saved to a file storage server. LTM Report EEG recording times extracted from the electronic medical records (EMR) and wearable placement times extracted from the Empatica Cloud were added into a REDCap database (Nashville, TN). Our system had 5 processing steps: 1) determine quantity of EEGF/EDF files stored on the server and distinguish between continuous EEGF/EDF files and shorter file excerpts, 2) identify LTM recording time periods without corresponding EEGF/EDF data, 3) ensure each enrollment, i.e. wearable wearing time, has time-matched corresponding EEGF and EDF files, 4) verify that all seizures recorded during enrollment were captured within collected EEGF and EDF files on the server 5) ensure accurate and consistent EEGF and EDF file labeling and server folder structure. Our system included 6 data cleaning tools implemented in MATLAB and Python to ensure accurate file storage and to compile an EEGF and EDF server data inventory (Figure 1a).

Results: The data cleaning system processed 772 files (532 EEG and 240 EDF files) for 185 patients with 1239 seizures captured, and generated a data collection and storage error report for a total of 313 errors and 6 main error types delineated per enrollment (Figure 1b). Utilization of the error report enabled the development of a solution pipeline to address data collection inconsistencies. The system identified 93 EEGF files without a corresponding EDF file, 41 EDF files without a corresponding EEGF file. EEGF or EDF files (88) that did not match enrollment periods were identified and aided in the discovery of 5 manual data entry errors in REDCap. Duration analysis identified 4 errors in REDCap with imputed wearable times. Using the automated analysis, focused data recovery efforts resulted in 74% (137/185) of patients (148 enrollments) with EDF data available for over 90% of the enrollment period (158 files and 13980 hours of recording).


Conclusions: The data cleaning tool enabled us to analyze and ensure the availability of previously collected long-term EEG data. This process allowed the creation of a robust and accurate dataset of continuous EEG data that can be utilized for future machine learning analysis in the pursuit of improved models for seizure monitoring, detection, and prediction.


Funding: The Epilepsy Research Fund supported this study.


Neurophysiology