Abstracts

Impact of Inter-rater Variability When Determining Presence of Electrographic Status Epilepticus in the Performance of an Automated Seizure-burden Monitoring Algorithm

Abstract number : 2.118
Submission category : 3. Neurophysiology / 3B. ICU EEG
Year : 2024
Submission ID : 1087
Source : www.aesnet.org
Presentation date : 12/8/2024 12:00:00 AM
Published date :

Authors :
Presenting Author: Parshaw Dorriz, MD – Providence Mission Hospital Mission Viejo

Tessa Johung, MD, PhD – Ceribell, Inc.
David King-Stephens, MD – UC Irvine
Matthew Markert, MD, PhD – Palo Alto Medical Foundation
Bogdan Patedakis Litvinov, MD – Yale School of Medicine
Farid Sadaka, MD – Mercy Hospital St. Louis
Khalid Alsherbini, MD – University of Arizona, Banner health care

Rationale: Timely and accurate interpretation of electroencephalography (EEG) is necessary to detect and diagnose nonconvulsive seizures. Automated seizure burden (SzB) algorithms, like Clarity (Ceribell, Inc.), aim to aid clinicians identify seizures and provide timely treatment decisions. However, their performance depends on training and validation against a “ground truth” of EEG labeling by epileptologists. We set to assess performance in the latest version of Clarity for detecting electrographic status epilepticus (ESE) and characterize how variability across expert readers may impact assessments of performance.

Methods: We sent 222 sequential clinical EEGs from an anonymized dataset to 5 independent reviewers, collected from 06/01/2023 to 07/31/2023. The blinded reviewers were fellowship-trained epileptologists or clinical neurophysiologists who regularly read point-of-care limited montage EEG. They identified whether cases met ACNS criteria for ESE or if they would treat as possible ESE. To assess inter-rater reliability, we estimated the average pairwise agreement between reviewers and the majority determination. Further, to assess the impact of this variability on the number of ESE cases, we iterated over 3-reviewer sets using a 2/3 majority as ground truth. The latest version of Clarity (which reports SzB as the percent of 10-second segments likely containing seizure patterns within a 5-minute rolling window) and Clarity Pro (which additionally indicates when ACNS criteria for ESE are met) were run post-hoc.

Results: From 222 EEGs, 3 cases (1.4%) were classified as ACNS-ESE by 3 or more of the reviewers. Using a 90% SzB and the ACNS criteria, the algorithms identified 3 cases as ESE (SN = 100% [29.2-100%]), with a specificity of 96.3% (92.8-98.4%). The positive and negative predictive values (PPV, NPV) were 27.3% and 100%, respectively. The average agreement between reviewers and the majority determination was kc = 0.35 (s.d. 0.32).



When using a ground truth of 2/3 reviewers, the number of ACNS-ESE cases ranged from 1-8 (median = 3.5 [3.0, 5.75]) depending on the 3 reviewers chosen. The average sensitivity of the 2 reviewers not included in the ground truth ranged from 12.5-100%, with specificity ranging 91.6-99.8%. In contrast, the average sensitivity of the 3 reviewers included in the ground truth ranged from 66.7-76.2%, with a specificity of 95.6-99.2%. The algorithm’s sensitivity for suspected ESE ranged from 50-100%, and its specificity was 95.4-97.2%. In the majority of the iterations the sensitivity was 100% (N = 7/10), and in the cases not captured by the algorithm the median SzB was 17% [3-33].

Conclusions: We demonstrated that interrater reliability remains a major factor impacting the ground truth when assessing EEG recordings. This ground truth is usually used to assess the performance of AI algorithms, like Clarity, and can significantly impact the sensitivity and specificity as shown here. With the latest version of the Clarity and Clarity Pro algorithms, we also showed high performance for detecting ACNS ESE when assessed against the majority of five independent reviewers.

Funding: Funded by Ceribell, Inc.

Neurophysiology