eLife assessment: Reconstructing Voice Identity from Noninvasive Auditory Cortex Recordings DOI Open Access
Andrea E. Martin

Опубликована: Июль 15, 2024

The cerebral processing of voice information is known to engage, in human as well non-human primates, "temporal areas" (TVAs) that respond preferentially conspecific vocalizations. However, how represented by neuronal populations these areas, particularly speaker identity information, remains poorly understood. Here, we used a deep neural network (DNN) generate high-level, small-dimension representational space for identity—the 'voice latent space' (VLS)—and examined its linear relation with activity via encoding, similarity, and decoding analyses. We find the VLS maps onto fMRI measures response tens thousands stimuli from hundreds different identities better accounts geometry TVAs than A1. Moreover, allowed TVA-based reconstructions preserved essential aspects assessed both machine classifiers listeners. These results indicate DNN-derived provides high-level representations TVAs.

Язык: Английский

The language network as a natural kind within the broader landscape of the human brain DOI
Evelina Fedorenko, Anna A. Ivanova, Tamar I. Regev

и другие.

Nature reviews. Neuroscience, Год журнала: 2024, Номер 25(5), С. 289 - 312

Опубликована: Апрель 12, 2024

Язык: Английский

Процитировано

75

Language in Brains, Minds, and Machines DOI
Greta Tuckute, Nancy Kanwisher, Evelina Fedorenko

и другие.

Annual Review of Neuroscience, Год журнала: 2024, Номер 47(1), С. 277 - 301

Опубликована: Апрель 26, 2024

It has long been argued that only humans could produce and understand language. But now, for the first time, artificial language models (LMs) achieve this feat. Here we survey new purchase LMs are providing on question of how is implemented in brain. We discuss why, a priori, might be expected to share similarities with human system. then summarize evidence represent linguistic information similarly enough enable relatively accurate brain encoding decoding during processing. Finally, examine which LM properties—their architecture, task performance, or training—are critical capturing neural responses review studies using as silico model organisms testing hypotheses about These ongoing investigations bring us closer understanding representations processes underlie our ability comprehend sentences express thoughts

Язык: Английский

Процитировано

13

Animal models of the human brain: Successes, limitations, and alternatives DOI
Nancy Kanwisher

Current Opinion in Neurobiology, Год журнала: 2025, Номер 90, С. 102969 - 102969

Опубликована: Фев. 1, 2025

Язык: Английский

Процитировано

2

Crossmixed convolutional neural network for digital speech recognition DOI Creative Commons
Quoc Bao Diep,

Hong Yen Phan,

Thanh Cong Truong

и другие.

PLoS ONE, Год журнала: 2024, Номер 19(4), С. e0302394 - e0302394

Опубликована: Апрель 26, 2024

Digital speech recognition is a challenging problem that requires the ability to learn complex signal characteristics such as frequency, pitch, intensity, timbre, and melody, which traditional methods often face issues in recognizing. This article introduces three solutions based on convolutional neural networks (CNN) solve problem: 1D-CNN designed directly from digital data; 2DS-CNN 2DM-CNN have more architecture, transferring raw waveform into transformed images using Fourier transform essential features. Experimental results four large data sets, containing 30,000 samples for each, show proposed models achieve superior performance compared well-known GoogLeNet AlexNet, with best accuracy of 95.87%, 99.65%, 99.76%, respectively. With 5-10% higher than other models, solution has demonstrated effectively features, improve speed, open up potential broad applications virtual assistants, medical recording, voice commands.

Язык: Английский

Процитировано

7

Contextual feature extraction hierarchies converge in large language models and the brain DOI
Gavin Mischler,

Yinghao Aaron Li,

Stephan Bickel

и другие.

Nature Machine Intelligence, Год журнала: 2024, Номер unknown

Опубликована: Ноя. 26, 2024

Язык: Английский

Процитировано

7

Neural processing of naturalistic audiovisual events in space and time DOI Creative Commons
Yu Hu, Yalda Mohsenzadeh

Communications Biology, Год журнала: 2025, Номер 8(1)

Опубликована: Янв. 22, 2025

Язык: Английский

Процитировано

1

Models optimized for real-world tasks reveal the necessity of precise temporal coding in hearing DOI Creative Commons
Mark R. Saddler,

Josh H. McDermott

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Апрель 25, 2024

ABSTRACT Neurons encode information in the timing of their spikes addition to firing rates. Spike is particularly precise auditory nerve, where action potentials phase lock sound with sub-millisecond precision, but its behavioral relevance remains uncertain. We optimized machine learning models perform real-world hearing tasks simulated cochlear input, assessing precision nerve spike needed reproduce human behavior. Models high-fidelity locking exhibited more human-like localization and speech perception than without, consistent an essential role hearing. However, temporal behavior varied across tasks, as did that benefited task performance. These effects suggest perceptual domains incorporate different extents depending on demands The results illustrate how optimizing for realistic can clarify candidate neural codes perception.

Язык: Английский

Процитировано

5

Self-supervision deep learning models are better models of human high-level visual cortex: The roles of multi-modality and dataset training size DOI Creative Commons
Idan Grosbard, Galit Yovel

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Янв. 13, 2025

Abstract With the rapid development of Artificial Neural Network based visual models, many studies have shown that these models show unprecedented potence in predicting neural responses to images cortex. Lately, advances computer vision introduced self-supervised where a model is trained using supervision from natural properties training set. This has led examination their prediction performance, which revealed better than supervised for with language or image-only supervision. In this work, we delve deeper into models’ ability explain representations object categories. We compare differed objectives examine they diverge predict fMRI and MEG recordings while participants are presented different Results both self-supervision was advantageous comparison classification training. addition, predictor later stages perception, shows consistent advantage over longer duration, beginning 80ms after exposure. Examination effect data size large dataset did not necessarily improve predictions, particular models. Finally, correspondence hierarchy each cortex showed image only conclude consistently recordings, type reveals property activity, language-supervision explaining onsets, explains long very early latencies response, naturally sharing corresponding hierarchical structure as brain.

Язык: Английский

Процитировано

0

A hierarchy of processing complexity and timescales for natural sounds in the human auditory cortex DOI Creative Commons
Kyle Rupp, Jasmine L. Hect, Emily E. Harford

и другие.

Proceedings of the National Academy of Sciences, Год журнала: 2025, Номер 122(18)

Опубликована: Апрель 28, 2025

Efficient behavior is supported by humans’ ability to rapidly recognize acoustically distinct sounds as members of a common category. Within the auditory cortex, critical unanswered questions remain regarding organization and dynamics sound categorization. We performed intracerebral recordings during epilepsy surgery evaluation 20 patient-participants listened natural sounds. then built encoding models predict neural responses using representations extracted from different layers within deep network (DNN) pretrained categorize acoustics. This approach yielded accurate throughout cortex. The complexity cortical site’s representation (measured depth DNN layer that produced best model) was closely related its anatomical location, with shallow, middle, associated core (primary cortex), lateral belt, parabelt regions, respectively. Smoothly varying gradients representational existed these increasing along posteromedial-to-anterolateral direction in belt posterior-to-anterior dorsal-to-ventral dimensions parabelt. characterized time (relative onset) when feature emerged; this measure temporal increased across hierarchy. Finally, we found separable effects region on complexity: sites took longer begin stimulus features had higher independent region, downstream regions encoded more complex dynamics. These findings suggest hierarchies timescales represent functional organizational principle stream underlying our

Язык: Английский

Процитировано

0

A large annotated dataset of vocalizations by common marmosets DOI Creative Commons
Charly Lamothe, Manon Obliger-Debouche, Paul Best

и другие.

Scientific Data, Год журнала: 2025, Номер 12(1)

Опубликована: Май 13, 2025

Non-human primates, our closest relatives, use a wide range of complex vocal signals for communication within their species. Previous research on marmoset (Callithrix jacchus) vocalizations has been limited by sampling rates not covering the whole hearing and insufficient labeling advanced analyses using Deep Neural Networks (DNNs). Here, we provide database common vocalizations, which were continuously recorded with rate 96 kHz from an animal holding facility housing simultaneously ~20 marmosets in three cages. The dataset comprises more than 800,000 files, amounting to 253 hours data collected over 40 months. Each recording lasts few seconds captures marmosets' social encompassing entire known repertoire during experimental period. Around 215,000 calls are annotated vocalization type. We offer trained classifier assist future investigations. Finally, validated 700 representative recordings cross-examining them four experts.

Язык: Английский

Процитировано

0