Abstracts

Guideline Implementation for Improving TMS Language Mapping Inter-rater Reliability: A Consortium Study

Abstract number : 1.085
Submission category : 11. Behavior/Neuropsychology/Language / 11B. Pediatrics
Year : 2025
Submission ID : 659
Source : www.aesnet.org
Presentation date : 12/6/2025 12:00:00 AM
Published date :

Authors :
Presenting Author: Taylor Jones, BS – University of Tennessee Health Science Center, Le Bonheur Children's Hospital

Fiona Baumer, MD – Stanford
Clifford Calley, M.D. – Dell Medical School
Hansel Greiner, MD – University of Cincinnati College of Medicine; Cincinnati Children's Hospital Medical Center
Anuj Jayakar, MD – Nicklaus Children's Hospital, Miami, FL, United States
Marianne Kanaris, BS – University of California, San Francisco, San Francisco, CA
Miriam Matthew, BS – University of California, San Francisco, San Francisco, CA
Brian Lundstrom, MD PhD – Mayo clinic, Rochester, Minnesota
Negar Noorizadeh, Ph.D. – University of Tennessee Health Science Center and Le Bonheur Neuroscience Institute, Le Bonheur Children's Hospital, Memphis, TN, USA
Mauricio Rodriguez, BS – Pediatric Neurology, Stanford University and Pediatric Neurology, Lucile Packard Children’s Hospital, Palo Alto, CA
Alexander Rotenberg, MD, PhD – Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
Keith Starnes, MD – Mayo Clinic, Rochester MN, USA.
Phiroz Tarapore, MD – 9. Neurological Surgery, University of California, San Francisco, San Francisco, CA
Melissa Tsuboyama, MD – Boston Childrens Hospital
Jackie Varner, BS, MS – University of Tennessee Health Science Center, Memphis, TN
Michael Cannito, Ph.D. – 12. Department of Communicative Disorders, University of Louisiana at LaFayette, Lafayette, LA
Shalini Narayana, PhD – University of Tennessee Health Science Center and Le Bonheur Neuroscience Institute, Le Bonheur Children's Hospital, Memphis, TN, USA

Rationale:

As Transcranial Magnetic Stimulation (TMS) continues to demonstrate its critical utility for localizing language in pediatric neurosurgical patients, the need for consistent TMS study interpretation becomes increasingly important. Standardizing the interpretation of TMS language mapping studies enhances the generalizability of results beyond a single epilepsy center and aligns with the primary goal of the nationwide consortium of pediatric TMS providers to further the application of TMS language mapping. However, to date, no studies have comprehensively addressed the lack of standardized interpretation in this field.



Methods:

In response, our consortium conducted an initial inter-rater reliability (IRR) assessment of 306 video clips of TMS speech responses to evaluate existing interpretive frameworks and identify areas with low agreement among raters. The clips reflected our general clinical population and range of error types commonly observed in practice (i.e. performance, semantic, no response, muscle artifact). Seven raters independently classified each clip into one of the following categories: no error, speech arrest, performance error, or semantic error. To improve IRR, we synthesized perspectives from clinical experience and speech pathology to establish clear conditions for each speech error category. Based on this, we developed a standardized classification flowchart as well as relevant contextual factors to guide the analysis of TMS language mapping speech responses. An interim training session was conducted using identified low-agreement clips to further validate these guidelines. A second IRR assessment was conducted with 258 new clips from a similar patient and error type distribution as the first assessment and 50 clips from the initial assessment. Fleiss’ Kappa (k) was used to evaluate whether IRR improved by guideline implementation. Intra-rater reliability was also assessed.



Results:

Overall, inter-rater reliability improved significantly, from k = 0.56 to k = 0.73 (Table 1). Agreement also improved across all speech error categories. Furthermore, the proportion of clips with 100% agreement increased from 49% at the first round to 65% (Figure 1). For the intra-rater reliability subset, raters consistently classified 26 out of the 50 clips with 100% agreement. The ratings at two time points correlated strongly with no significant difference indicating excellent intra-rater reliability.



Conclusions:

Our findings demonstrate that well-defined classification criteria and targeted training can align the interpretive frameworks of TMS language mapping data, leading to improved IRR. These standardized resources and training should be incorporated into the onboarding process for all TMS practitioners, technicians, and support staff to ensure the validity, comparability, and generalizability of TMS language mapping data across diverse patient populations and over time.



Funding: Pediatric Epilepsy Research Foundation

Behavior