Cited by The Development of Speaking and Singing in Infants May Play a Role in Genomics and Dementia in Humans

Hierarchical predictive processing in the brain: Predictive coding and its neural signatures DOI

Yiyuan Teresa Huang, Zenas C. Chao

The Psychology of learning and motivation/The psychology of learning and motivation, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 1, 2025

Language: Английский

Citations

Enhancement of speech-in-noise comprehension through vibrotactile stimulation at the syllabic rate DOI

Pierre Guilleminot, Tobias Reichenbach

Proceedings of the National Academy of Sciences, Journal Year: 2022, Volume and Issue: 119(13)

Published: March 21, 2022

SignificanceSyllables are important building blocks of speech. They occur at a rate between 4 and 8 Hz, corresponding to the theta frequency range neural activity in cerebral cortex. When listening speech, becomes aligned syllabic rhythm, presumably aiding parsing speech signal into distinct syllables. However, this cannot only be influenced by sound, but also somatosensory information. Here, we show that presentation vibrotactile signals can enhance comprehension background noise. We further provide evidence multisensory enhancement reflects integration auditory tactile information

Language: Английский

Citations

Editorial: Neural Tracking: Closing the Gap Between Neurophysiology and Translational Medicine DOI

Giovanni M. Di Liberto, Jens Hjortkjær, Nima Mesgarani

et al.

Frontiers in Neuroscience, Journal Year: 2022, Volume and Issue: 16

Published: March 16, 2022

EDITORIAL article Front. Neurosci., 16 March 2022Sec.Auditory Cognitive Neuroscience https://doi.org/10.3389/fnins.2022.872600

Citations

Neural Speech Tracking Highlights the Importance of Visual Speech in Multi-speaker Situations DOI

Chandra Leon Haider, Hyojin Park, Anne Hauswald

et al.

Journal of Cognitive Neuroscience, Journal Year: 2023, Volume and Issue: 36(1), P. 128 - 142

Published: Nov. 17, 2023

Visual speech plays a powerful role in facilitating auditory processing and has been publicly noticed topic with the wide usage of face masks during COVID-19 pandemic. In previous magnetoencephalography study, we showed that occluding mouth area significantly impairs neural tracking. To rule out possibility this deterioration is because degraded sound quality, present follow-up presented participants audiovisual (AV) audio-only (A) speech. We further independently manipulated trials by adding mask distractor speaker. Our results clearly show only affect tracking AV conditions, not A conditions. This shows indeed primarily impact blocking visual acoustic degradation. can highlight how spectrogram, lip movements lexical units are tracked on sensor level. benefits for spectrogram especially multi-speaker condition. While additional improvement benefit over clear (phonemes word onsets) do enhancement at all. hypothesize young normal hearing individuals, information from input less used specific feature extraction, but acts more as general resource guiding attention.

Language: Английский

Citations

Research on Lightweight Disaster Classification Based on High-Resolution Remote Sensing Images DOI

Jianye Yuan,

Xin Ma, Ge Han

et al.

Remote Sensing, Journal Year: 2022, Volume and Issue: 14(11), P. 2577 - 2577

Published: May 27, 2022

With the increasing frequency of natural disasters becoming, it is very important to classify and identify disasters. We propose a lightweight disaster classification model, which has lower computation parameter quantities higher accuracy than other models. For this purpose, paper specially proposes SDS-Network algorithm, optimized on ResNet, deal with above problems remote sensing images. First, implements spatial attention mechanism improve algorithm; then, depth separable convolution introduced reduce number model calculations parameters while ensuring finally, effect increased by adjusting some hyperparameters. The experimental results show that, compared classic AlexNet, ResNet18, VGG16, VGG19, Densenet121 models, algorithm in accuracy, when models mobilenet series, shufflenet squeezenet mnasnet complexity rate. According comprehensive performance comparison charts made article, found that still better regnet series algorithm. Furthermore, after verification public data set, have good generalization ability. Thus, we can conclude effect, suitable for tasks. Finally, verified sets proposed ability portability.

Language: Английский

Citations

Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans DOI

Enrico Varano, Konstantinos Vougioukas, Pingchuan Ma

et al.

Frontiers in Neuroscience, Journal Year: 2022, Volume and Issue: 15

Published: Jan. 5, 2022

Understanding speech becomes a demanding task when the environment is noisy. Comprehension of in noise can be substantially improved by looking at speaker's face, and this audiovisual benefit even more pronounced people with hearing impairment. Recent advances AI have allowed to synthesize photorealistic talking faces from recording still image person's face an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider produced recently introduced generative adversarial network (GAN), show that humans cannot distinguish between synthesized natural videos. Importantly, then videos significantly aid understanding noise, although motions yield yet higher benefit. We further find recognizer (AVSR) benefits as well. Our results suggest synthesizing used comprehension difficult listening environments.

Language: Английский

Citations

Can deep learning provide a generalizable model for dynamic sound encoding in auditory cortex? DOI

Jacob R. Pennington, Stephen V. David

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: June 13, 2022

Abstract Convolutional neural networks (CNNs) can provide powerful and flexible models of sensory processing. However, the utility CNNs in studying auditory system has been limited by their requirement for large datasets complex response properties single neurons. To address these limitations, we developed a population encoding model: CNN that simultaneously predicts activity several hundred neurons recorded during presentation set natural sounds. This approach defines shared spectro-temporal space pools statistical power across Population varying architecture performed consistently better than traditional linear-nonlinear on data from primary non-primary cortex. Moreover, were highly generalizable. The output layer model pre-trained one could be fit to novel units, achieving performance equivalent original data. ability generalize suggests capture general computations

Language: Английский

Citations

Linear Modeling of Neurophysiological Responses to Naturalistic Stimuli: Methodological Considerations for Applied Research DOI

Michael J. Crosse, Nathaniel J. Zuk, Giovanni M. Di Liberto

et al.

Published: May 11, 2021

Cognitive neuroscience has seen an increase in the use of linear modelling techniques for studying processing natural, environmental stimuli. The availability such computational tools prompted similar investigations many clinical domains, facilitating study cognitive and sensory deficits within ecologically relevant context. However, (and often highly-heterogeneous) cohorts introduces added layer complexity to procedures, leading increased risk improper usage and, as a result, inconsistent conclusions. Here, we outline some key methodological considerations applied research include worked examples both simulated empirical electrophysiological (EEG) data. In particular, focus on experimental design, data preprocessing stimulus feature extraction, model training evaluation, interpretation weights. Throughout paper, demonstrate how implement each stage MATLAB using mTRF-Toolbox discuss address issues that could arise research. doing so, highlight importance understanding these more technical points design analysis, provide resource researchers investigating ecologically-rich

Language: Английский

Citations

Decreasing hearing ability does not lead to improved visual speech extraction as revealed in a neural speech tracking paradigm DOI

Chandra Leon Haider, Anne Hauswald, Nathan Weisz

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 14, 2024

Abstract The use of visual speech is thought to be especially important in situations where acoustics are unclear and individuals with hearing impairment. To investigate this a neural tracking paradigm, we measured MEG sixty-seven mid- old-age during audiovisual (AV), audio-only (A), visual-only (V) the context face masks. First, could extend previous findings by showing that not only young normal-hearing but also aging decreasing ability brain superior neurally acoustic spectrogram AV compared A presentations, multi-speaker situations. addition lip movements further increases benefit. Second, show lower levels affected more However, population, effect seems composite blocked distorted acoustics. Third, confirm findings, benefit varies strongly across individuals. We general individual predicts how much people engage difficult listening Interestingly, was correlated thresholds therefore does seem widely used compensatory strategy impaired.

Language: Английский

Citations

Crossmodal hierarchical predictive coding for audiovisual sequences in the human brain DOI

Yiyuan Teresa Huang, Chien‐Te Wu,

Yi-Xin Miranda Fang

et al.

Communications Biology, Journal Year: 2024, Volume and Issue: 7(1)

Published: Aug. 9, 2024

Predictive coding theory suggests the brain anticipates sensory information using prior knowledge. While this has been extensively researched within individual modalities, evidence for predictive processing across modalities is limited. Here, we examine how crossmodal knowledge represented and learned in brain, by identifying hierarchical networks underlying predictions when of one modality leads to a prediction another modality. We record electroencephalogram (EEG) during audiovisual local-global oddball paradigm, which predictability transitions between tones images are manipulated at both stimulus sequence levels. To dissect complex signals our EEG data, employed model-fitting approach untangle neural interactions hierarchies. The result demonstrates that integration occurs levels multi-stimulus sequences. Furthermore, identify spatio-spectro-temporal signatures prediction-error hierarchies reveal auditory visual errors rapidly redirected central-parietal electrodes learning through alpha-band interactions. Our study mechanism where unimodal processed distributed form

Language: Английский

Citations