Measuring Speech Fluency using a Validated Automated Transcription and Diarization Approach
Abstract number :
2.01
Submission category :
11. Behavior/Neuropsychology/Language / 11A. Adult
Year :
2025
Submission ID :
78
Source :
www.aesnet.org
Presentation date :
12/7/2025 12:00:00 AM
Published date :
Authors :
Presenting Author: Ayelet Rosenberg, MSc – New York University Langone Health
Eden Tefera, BS – New York University Langone Health
Zehui Gui, MS – New York University
Helen Borges, MSc – New York University Langone Health
Aaqib Mansoor, BA – New York University Langone Health
William Barr, PhD – New York University Langone Health
Simon Henin, Ph.D. – New York University Langone Health
Stephen Johnson, PhD – New York University Langone Health
Anli Liu, M.D. – New York University Langone Health
Rationale: Word-finding difficulty is often reported in patients with epilepsy but is challenging to measure using standard clinical tools. We applied an automated transcription and speaker diarization pipeline to extract word rate and choice (frequency) from the spontaneous recall of famous faces (FF) in patients with temporal lobe epilepsy (TLE) and healthy controls (HC).
Methods: Subjects were eligible if they were fluent in English and had MOCA scores >22 (TLE) or >26 (HCs). Unilateral TLE diagnosis was determined by ictal or interictal EEG, MRI, and semiology. Subjects were shown 20 faces of famous figures in politics, sports, and entertainment active between 2008-2017 and were asked to recall as many biographical details as possible. We used OpenAI’s Whisper to transcribe the subjects’ speech and pyannote to assign speaker labels. After manually correcting diarization errors, we used natural language processing (NLP) and the acoustic sound envelope to calculate rate-based features, including word rate, and utterance and pause duration, as well as word frequency. We defined low-frequency words as those occurring < 1 in 1 million times in a corpus of 1 billion words, and high-frequency words are those occurring > 1 in 10,000 times. We then calculated the lexical index as the ratio of low to high-frequency words used during recall, with higher values indicating the use of rarer words.
Results: Seventy adults (N=70; 44 TLE, 26 HCs) participated in this prospective study (2018-2023). Forty-four subjects were tested in person, and 26 participated remotely. There were no group-level differences in sex (61% female, p=0.62) or education level (79% college or above, p=0.54) between TLEs and HCs. TLE patients were significantly older (mean age TLE=32; HC=27; p=0.013), and fewer were right-handed (80% vs. 100%, p=0.047). TLEs also scored significantly lower on the MOCA (p < 0.0001), reflecting the higher cutoff required for HCs. Word rate was similar between groups (TLE: 2.39 + 0.61; HC: 2.43 +0.48, p=0.51), and was not associated with age, education, or MOCA scores. Similarly, there were no group differences in lexical index (TLE: 0.027 +0.01; HC: 0.028 +0.01, p=0.60), number of pauses (TLE: 337 +61.4; HC: 338 +65.3, p=0.16) or mean pause duration (TLE: 0.55 +0.06 sec; HC: 0.61 +0.46, p=0.23) Among TLEs, lexical index was significantly correlated with performance on the Boston Naming Test (BNT) (p=0.0005), but not with MOCA. In contrast, for HCs lexical index correlated with MOCA (p=0.045) and famous faces recall (p< 0.0001).
Behavior