Reconstructing Voice Identity from Noninvasive Auditory Cortex Recordings DOI Open Access
Charly Lamothe, Etienne Thoret, Régis Trapeau

et al.

Published: July 15, 2024

The cerebral processing of voice information is known to engage, in human as well non-human primates, “temporal areas” (TVAs) that respond preferentially conspecific vocalizations. However, how represented by neuronal populations these areas, particularly speaker identity information, remains poorly understood. Here, we used a deep neural network (DNN) generate high-level, small-dimension representational space for identity—the ‘voice latent space’ (VLS)—and examined its linear relation with activity via encoding, similarity, and decoding analyses. We find the VLS maps onto fMRI measures response tens thousands stimuli from hundreds different identities better accounts geometry TVAs than A1. Moreover, allowed TVA-based reconstructions preserved essential aspects assessed both machine classifiers listeners. These results indicate DNN-derived provides high-level representations TVAs.

Language: Английский

The language network as a natural kind within the broader landscape of the human brain DOI
Evelina Fedorenko, Anna A. Ivanova, Tamar I. Regev

et al.

Nature reviews. Neuroscience, Journal Year: 2024, Volume and Issue: 25(5), P. 289 - 312

Published: April 12, 2024

Language: Английский

Citations

71

Language in Brains, Minds, and Machines DOI
Greta Tuckute, Nancy Kanwisher, Evelina Fedorenko

et al.

Annual Review of Neuroscience, Journal Year: 2024, Volume and Issue: 47(1), P. 277 - 301

Published: April 26, 2024

It has long been argued that only humans could produce and understand language. But now, for the first time, artificial language models (LMs) achieve this feat. Here we survey new purchase LMs are providing on question of how is implemented in brain. We discuss why, a priori, might be expected to share similarities with human system. then summarize evidence represent linguistic information similarly enough enable relatively accurate brain encoding decoding during processing. Finally, examine which LM properties—their architecture, task performance, or training—are critical capturing neural responses review studies using as silico model organisms testing hypotheses about These ongoing investigations bring us closer understanding representations processes underlie our ability comprehend sentences express thoughts

Language: Английский

Citations

13

Contextual feature extraction hierarchies converge in large language models and the brain DOI
Gavin Mischler,

Yinghao Aaron Li,

Stephan Bickel

et al.

Nature Machine Intelligence, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 26, 2024

Language: Английский

Citations

7

Crossmixed convolutional neural network for digital speech recognition DOI Creative Commons
Quoc Bao Diep,

Hong Yen Phan,

Thanh Cong Truong

et al.

PLoS ONE, Journal Year: 2024, Volume and Issue: 19(4), P. e0302394 - e0302394

Published: April 26, 2024

Digital speech recognition is a challenging problem that requires the ability to learn complex signal characteristics such as frequency, pitch, intensity, timbre, and melody, which traditional methods often face issues in recognizing. This article introduces three solutions based on convolutional neural networks (CNN) solve problem: 1D-CNN designed directly from digital data; 2DS-CNN 2DM-CNN have more architecture, transferring raw waveform into transformed images using Fourier transform essential features. Experimental results four large data sets, containing 30,000 samples for each, show proposed models achieve superior performance compared well-known GoogLeNet AlexNet, with best accuracy of 95.87%, 99.65%, 99.76%, respectively. With 5-10% higher than other models, solution has demonstrated effectively features, improve speed, open up potential broad applications virtual assistants, medical recording, voice commands.

Language: Английский

Citations

6

Models optimized for real-world tasks reveal the necessity of precise temporal coding in hearing DOI Creative Commons
Mark R. Saddler,

Josh H. McDermott

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: April 25, 2024

ABSTRACT Neurons encode information in the timing of their spikes addition to firing rates. Spike is particularly precise auditory nerve, where action potentials phase lock sound with sub-millisecond precision, but its behavioral relevance remains uncertain. We optimized machine learning models perform real-world hearing tasks simulated cochlear input, assessing precision nerve spike needed reproduce human behavior. Models high-fidelity locking exhibited more human-like localization and speech perception than without, consistent an essential role hearing. However, temporal behavior varied across tasks, as did that benefited task performance. These effects suggest perceptual domains incorporate different extents depending on demands The results illustrate how optimizing for realistic can clarify candidate neural codes perception.

Language: Английский

Citations

5

Neural processing of naturalistic audiovisual events in space and time DOI Creative Commons
Yu Hu, Yalda Mohsenzadeh

Communications Biology, Journal Year: 2025, Volume and Issue: 8(1)

Published: Jan. 22, 2025

Language: Английский

Citations

0

Self-supervision deep learning models are better models of human high-level visual cortex: The roles of multi-modality and dataset training size DOI Creative Commons
Idan Grosbard, Galit Yovel

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 13, 2025

Abstract With the rapid development of Artificial Neural Network based visual models, many studies have shown that these models show unprecedented potence in predicting neural responses to images cortex. Lately, advances computer vision introduced self-supervised where a model is trained using supervision from natural properties training set. This has led examination their prediction performance, which revealed better than supervised for with language or image-only supervision. In this work, we delve deeper into models’ ability explain representations object categories. We compare differed objectives examine they diverge predict fMRI and MEG recordings while participants are presented different Results both self-supervision was advantageous comparison classification training. addition, predictor later stages perception, shows consistent advantage over longer duration, beginning 80ms after exposure. Examination effect data size large dataset did not necessarily improve predictions, particular models. Finally, correspondence hierarchy each cortex showed image only conclude consistently recordings, type reveals property activity, language-supervision explaining onsets, explains long very early latencies response, naturally sharing corresponding hierarchical structure as brain.

Language: Английский

Citations

0

Animal models of the human brain: Successes, limitations, and alternatives DOI
Nancy Kanwisher

Current Opinion in Neurobiology, Journal Year: 2025, Volume and Issue: 90, P. 102969 - 102969

Published: Feb. 1, 2025

Language: Английский

Citations

0

A hierarchy of processing complexity and timescales for natural sounds in the human auditory cortex DOI Creative Commons
Kyle Rupp, Jasmine L. Hect, Emily E. Harford

et al.

Proceedings of the National Academy of Sciences, Journal Year: 2025, Volume and Issue: 122(18)

Published: April 28, 2025

Efficient behavior is supported by humans’ ability to rapidly recognize acoustically distinct sounds as members of a common category. Within the auditory cortex, critical unanswered questions remain regarding organization and dynamics sound categorization. We performed intracerebral recordings during epilepsy surgery evaluation 20 patient-participants listened natural sounds. then built encoding models predict neural responses using representations extracted from different layers within deep network (DNN) pretrained categorize acoustics. This approach yielded accurate throughout cortex. The complexity cortical site’s representation (measured depth DNN layer that produced best model) was closely related its anatomical location, with shallow, middle, associated core (primary cortex), lateral belt, parabelt regions, respectively. Smoothly varying gradients representational existed these increasing along posteromedial-to-anterolateral direction in belt posterior-to-anterior dorsal-to-ventral dimensions parabelt. characterized time (relative onset) when feature emerged; this measure temporal increased across hierarchy. Finally, we found separable effects region on complexity: sites took longer begin stimulus features had higher independent region, downstream regions encoded more complex dynamics. These findings suggest hierarchies timescales represent functional organizational principle stream underlying our

Language: Английский

Citations

0

Models optimized for real-world tasks reveal the task-dependent necessity of precise temporal coding in hearing DOI Creative Commons
Mark R. Saddler,

Josh H. McDermott

Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)

Published: Dec. 4, 2024

Abstract Neurons encode information in the timing of their spikes addition to firing rates. Spike is particularly precise auditory nerve, where action potentials phase lock sound with sub-millisecond precision, but its behavioral relevance remains uncertain. We optimized machine learning models perform real-world hearing tasks simulated cochlear input, assessing precision nerve spike needed reproduce human behavior. Models high-fidelity locking exhibited more human-like localization and speech perception than without, consistent an essential role hearing. However, temporal behavior varied across tasks, as did that benefited task performance. These effects suggest perceptual domains incorporate different extents depending on demands The results illustrate how optimizing for realistic can clarify candidate neural codes perception.

Language: Английский

Citations

2