ADT Network: A Novel Nonlinear Method for Decoding Speech Envelopes From EEG Signals DOI Creative Commons
Ruixiang Liu, Chang Liu, Dan Cui

et al.

Trends in Hearing, Journal Year: 2024, Volume and Issue: 28

Published: Jan. 1, 2024

Decoding speech envelopes from electroencephalogram (EEG) signals holds potential as a research tool for objectively assessing auditory processing, which could contribute to future developments in hearing loss diagnosis. However, current methods struggle meet both high accuracy and interpretability. We propose deep learning model called the decoding transformer (ADT) network envelope reconstruction EEG address these issues. The ADT uses spatio-temporal convolution feature extraction, followed by decoder decode envelopes. Through anticausal masking, considers only features match natural relationship of EEG. Performance evaluation shows that achieves average scores 0.168 0.167 on SparrKULee DTU datasets, respectively, rivaling those other nonlinear models. Furthermore, visualizing weights layer time-domain filters brain topographies, combined with an ablation study temporal kernels, we analyze behavioral patterns results indicate low- (0.5-8 Hz) high-frequency (14-32 are more critical active regions primarily distributed bilaterally cortex, consistent previous research. Visualization attention further validated In summary, balances performance interpretability, making it promising studying neural tracking.

Language: Английский

Neural tracking as a diagnostic tool to assess the auditory pathway DOI
Marlies Gillis, Jana Van Canneyt, Tom Francart

et al.

Hearing Research, Journal Year: 2022, Volume and Issue: 426, P. 108607 - 108607

Published: Sept. 14, 2022

Language: Английский

Citations

44

The Early Subcortical Response at the Fundamental Frequency of Speech Is Temporally Separated from Later Cortical Contributions DOI

Alina Schüller,

Achim Schilling, Patrick Krauß

et al.

Journal of Cognitive Neuroscience, Journal Year: 2024, Volume and Issue: 36(3), P. 475 - 491

Published: Jan. 1, 2024

Abstract Most parts of speech are voiced, exhibiting a degree periodicity with fundamental frequency and many higher harmonics. Some neural populations respond to this temporal fine structure, in particular at the frequency. This frequency-following response consists both subcortical cortical contributions can be measured through EEG as well magnetoencephalography (MEG), although differ aspects activity that they capture: is sensitive radial tangential sources deep sources, whereas MEG more restrained measurement superficial activity. responses continuous have shown an early contribution, latency around 9 msec, agreement measurements short tokens, not yet revealed such component. Here, we analyze long segments speech. We find latencies 4–11 followed by later right-lateralized activities delays 20–58 msec potential activities. Our results show component FFR from participants its agrees EEG. They furthermore temporally separated contributions, enabling independent assessment components toward further processing.

Language: Английский

Citations

6

Extending Subcortical EEG Responses to Continuous Speech to the Sound-Field DOI Creative Commons
Florine L. Bachmann, Joshua P. Kulasingham, Kasper Eskelund

et al.

Trends in Hearing, Journal Year: 2024, Volume and Issue: 28

Published: Jan. 1, 2024

The auditory brainstem response (ABR) is a valuable clinical tool for objective hearing assessment, which conventionally detected by averaging neural responses to thousands of short stimuli. Progressing beyond these unnatural stimuli, continuous speech presented via earphones have been recently using linear temporal functions (TRFs). Here, we extend earlier studies measuring subcortical in the sound-field, and assess amount data needed estimate TRFs. Electroencephalography (EEG) was recorded from 24 normal participants while they listened clicks stories loudspeakers. Subcortical TRFs were computed after accounting non-linear processing periphery either stimulus rectification or an nerve model. Our results demonstrated that could be reliably measured sound-field. estimated models outperformed simple rectification, 16 minutes sufficient all show clear wave V peaks both sound-field highly consistent earphone conditions, with click ABRs. However, required slightly more (16 minutes) achieve compared (12 minutes), possibly due effects room acoustics. By investigating this study lays groundwork bringing assessment closer real-life may lead improved evaluations smart technologies.

Language: Английский

Citations

4

The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information DOI Creative Commons
Mikolaj Kegler, Hugo Weissbart, Tobias Reichenbach

et al.

Frontiers in Neuroscience, Journal Year: 2022, Volume and Issue: 16

Published: July 22, 2022

Spoken language comprehension requires rapid and continuous integration of information, from lower-level acoustic to higher-level linguistic features. Much this processing occurs in the cerebral cortex. Its neural activity exhibits, for instance, correlates predictive processing, emerging at delays a few 100 ms. However, auditory pathways are also characterized by extensive feedback loops cortical areas ones as well subcortical structures. Early can therefore be influenced cognitive processes, but it remains unclear whether such contributes processing. Here, we investigated early speech-evoked that emerges fundamental frequency. We analyzed EEG recordings obtained when subjects listened story read single speaker. identified response tracking speaker's frequency occurred delay 11 ms, while another elicited high-frequency modulation envelope higher harmonics exhibited larger magnitude longer latency about 18 ms with an additional significant component around 40 Notably, earlier components likely originate structures, latter presumably involves contributions regions. Subsequently, determined these responses each individual word story. then quantified context-independent used model compute context-dependent surprisal precision. The represented how predictable is, given previous context, precision reflected confidence predicting next past context. found word-level were predominantly features: average its variability. Amongst features, only showed weak modulation. Our results show is already suggesting top-down response.

Language: Английский

Citations

18

Attentional Modulation of the Cortical Contribution to the Frequency-Following Response Evoked by Continuous Speech DOI Creative Commons

Alina Schüller,

Achim Schilling, Patrick Krauß

et al.

Journal of Neuroscience, Journal Year: 2023, Volume and Issue: 43(44), P. 7429 - 7440

Published: Oct. 4, 2023

Selective attention to one of several competing speakers is required for comprehending a target speaker among other voices and successful communication with them. It moreover has been found involve the neural tracking low-frequency speech rhythms in auditory cortex. Effects selective have also subcortical activities, particular regarding frequency-following response related fundamental frequency (speech-FFR). Recent investigations have, however, shown that speech-FFR contains cortical contributions as well. remains unclear whether these are modulated by attention. Here we used magnetoencephalography assess attentional modulation speech-FFR. We presented both male female participants two signals analyzed responses during switching between speakers. Our findings revealed robust contribution speech-FFR: were higher when was attended than they ignored. that, regardless attention, voice lower elicited larger frequency. results show does not only occur subcortically but extends cortex

Language: Английский

Citations

10

Auditory EEG decoding challenge for ICASSP 2023 DOI Creative Commons
Mohammad Jalilpour Monesi, Lies Bollens, Bernd Accou

et al.

IEEE Open Journal of Signal Processing, Journal Year: 2024, Volume and Issue: 5, P. 652 - 661

Published: Jan. 1, 2024

This paper describes the auditory EEG challenge, organized as one of Signal Processing Grand Challenges at ICASSP 2023. The challenge provides recordings 85 subjects who listened to continuous speech, audiobooks or podcasts, while their brain activity was recorded. 71 were provided a training set such that participants could train models on relatively large dataset. remaining 14 used held-out in evaluating challenge. consists two tasks relate electroencephalogram (EEG) signals presented speech stimulus. first task, match-mismatch, aims determine which segments induced given segment. In second regression goal is reconstruct envelope from EEG. For match-mismatch performance different teams close baseline model, and did generalize well unseen subjects. contrast, top significantly improved over stories test failing

Language: Английский

Citations

3

Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention DOI Creative Commons
Christian Brodbeck, Jonathan Z. Simon

Frontiers in Neuroscience, Journal Year: 2022, Volume and Issue: 16

Published: Aug. 8, 2022

Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice in clean speech, with responses reflecting both strength value. However, is also a powerful cue for auditory stream segregation, especially when competing streams differing fundamental frequency, as the case multiple speakers talk simultaneously. We therefore investigated how speech affected presence second, task-irrelevant speaker. analyzed human magnetoencephalography (MEG) to continuous narrative presented either single talker quiet background or two-talker mixture male female In was associated right-dominant response, peaking at latency around 100 ms, consistent previous electroencephalography electrocorticography results. The response tracked relative value speaker’s frequency. mixture, attended speaker bilaterally, regardless whether not there simultaneously present irrelevant Pitch reduced: only right hemisphere still significantly unattended speaker, during intervals which no talker’s speech. Taken together, these results suggest that pitch-based segregation speakers, least measured by macroscopic tracking, entirely automatic but strongly dependent on selective attention.

Language: Английский

Citations

15

Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks DOI Creative Commons
Michael Thornton, Danilo P. Mandic, Tobias Reichenbach

et al.

IEEE Open Journal of Signal Processing, Journal Year: 2024, Volume and Issue: 5, P. 700 - 716

Published: Jan. 1, 2024

The electroencephalogram (EEG) offers a non-invasive means by which listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered aids. Previously, we developed for ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These aimed to solve match-mismatch task: given short temporal segment recordings, and two candidate segments, task is identify segments temporally aligned, matched, with segment. made use cortical responses envelope, as well speech-related frequency-following responses, relate recordings stimuli. Here comprehensively document methods were developed. We extend our previous analysis exploring association between speaker characteristics (pitch sex) classification accuracy, provide full statistical final performance evaluated on heldout portion dataset. Finally, generalisation capabilities are characterised, evaluating them using an entirely different dataset contains recorded under variety speech-listening conditions. results show that achieve accurate robust accuracies, they can even serve attention without additional training.

Language: Английский

Citations

3

Cortical responses time-locked to continuous speech in the high-gamma band depend on selective attention DOI Creative Commons
Vrishab Commuri, Joshua P. Kulasingham, Jonathan Z. Simon

et al.

Frontiers in Neuroscience, Journal Year: 2023, Volume and Issue: 17

Published: Dec. 14, 2023

Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust tracking the speaker's fundamental frequency in high-gamma band (70-200 Hz), but little is currently known about whether such depend on focus of selective attention. In this study 22 human subjects listened concurrent, fixed-rate, from male and female speakers, were asked selectively attend one speaker at a time, while their neural recorded with MEG. The pitch range coincided lower band, whereas higher had much less overlap, only upper end band. Neural analyzed using temporal response function (TRF) framework. As expected, demonstrate male's speech, peak latency ~40 ms. Critically, magnitude depends attention: significantly greater when attended than it not attended, under acoustically identical conditions. This clear demonstration that even very early auditory are influenced top-down, cognitive, processing mechanisms.

Language: Английский

Citations

7

Neural Measures of Pitch Processing in EEG Responses to Running Speech DOI Creative Commons
Florine L. Bachmann, Ewen MacDonald, Jens Hjortkjær

et al.

Frontiers in Neuroscience, Journal Year: 2021, Volume and Issue: 15

Published: Dec. 21, 2021

Linearized encoding models are increasingly employed to model cortical responses running speech. Recent extensions subcortical suggest clinical perspectives, potentially complementing auditory brainstem (ABRs) or frequency-following (FFRs) that current standards. However, while it is well-known the responds both transient amplitude variations and stimulus periodicity gives rise pitch, these features co-vary in Here, we discuss challenges disentangling drive response Cortical electroencephalographic (EEG) speech from 19 normal-hearing listeners (12 female) were analyzed. Using forward regression models, confirm rectified broadband signal yield temporal functions consistent with wave V of ABR, as shown previous work. Peak latency speech-evoked correlated standard click-evoked ABRs recorded at vertex electrode (Cz). Similar could be obtained using fundamental frequency (F0) predictor. simulations indicated dissociating fine structure F0 not possible given high co-variance poor signal-to-noise ratio (SNR) EEG responses. In cortex, data replicated findings indicating envelope tracking on frontal electrodes can dissociated slow (relative pitch). Yet, no association between F0-tracking relative pitch detected. These results indicate comparable ABRs, pitch-related processing may challenging natural stimuli.

Language: Английский

Citations

16