The
cerebral
processing
of
voice
information
is
known
to
engage,
in
human
as
well
non-human
primates,
“temporal
areas”
(TVAs)
that
respond
preferentially
conspecific
vocalizations.
However,
how
represented
by
neuronal
populations
these
areas,
particularly
speaker
identity
information,
remains
poorly
understood.
Here,
we
used
a
deep
neural
network
(DNN)
generate
high-level,
small-dimension
representational
space
for
identity—the
‘voice
latent
space’
(VLS)—and
examined
its
linear
relation
with
activity
via
encoding,
similarity,
and
decoding
analyses.
We
find
the
VLS
maps
onto
fMRI
measures
response
tens
thousands
stimuli
from
hundreds
different
identities
better
accounts
geometry
TVAs
than
A1.
Moreover,
allowed
TVA-based
reconstructions
preserved
essential
aspects
assessed
both
machine
classifiers
listeners.
These
results
indicate
DNN-derived
provides
high-level
representations
TVAs.
Annual Review of Neuroscience,
Journal Year:
2024,
Volume and Issue:
47(1), P. 277 - 301
Published: April 26, 2024
It
has
long
been
argued
that
only
humans
could
produce
and
understand
language.
But
now,
for
the
first
time,
artificial
language
models
(LMs)
achieve
this
feat.
Here
we
survey
new
purchase
LMs
are
providing
on
question
of
how
is
implemented
in
brain.
We
discuss
why,
a
priori,
might
be
expected
to
share
similarities
with
human
system.
then
summarize
evidence
represent
linguistic
information
similarly
enough
enable
relatively
accurate
brain
encoding
decoding
during
processing.
Finally,
examine
which
LM
properties—their
architecture,
task
performance,
or
training—are
critical
capturing
neural
responses
review
studies
using
as
silico
model
organisms
testing
hypotheses
about
These
ongoing
investigations
bring
us
closer
understanding
representations
processes
underlie
our
ability
comprehend
sentences
express
thoughts
PLoS ONE,
Journal Year:
2024,
Volume and Issue:
19(4), P. e0302394 - e0302394
Published: April 26, 2024
Digital
speech
recognition
is
a
challenging
problem
that
requires
the
ability
to
learn
complex
signal
characteristics
such
as
frequency,
pitch,
intensity,
timbre,
and
melody,
which
traditional
methods
often
face
issues
in
recognizing.
This
article
introduces
three
solutions
based
on
convolutional
neural
networks
(CNN)
solve
problem:
1D-CNN
designed
directly
from
digital
data;
2DS-CNN
2DM-CNN
have
more
architecture,
transferring
raw
waveform
into
transformed
images
using
Fourier
transform
essential
features.
Experimental
results
four
large
data
sets,
containing
30,000
samples
for
each,
show
proposed
models
achieve
superior
performance
compared
well-known
GoogLeNet
AlexNet,
with
best
accuracy
of
95.87%,
99.65%,
99.76%,
respectively.
With
5-10%
higher
than
other
models,
solution
has
demonstrated
effectively
features,
improve
speed,
open
up
potential
broad
applications
virtual
assistants,
medical
recording,
voice
commands.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: April 25, 2024
ABSTRACT
Neurons
encode
information
in
the
timing
of
their
spikes
addition
to
firing
rates.
Spike
is
particularly
precise
auditory
nerve,
where
action
potentials
phase
lock
sound
with
sub-millisecond
precision,
but
its
behavioral
relevance
remains
uncertain.
We
optimized
machine
learning
models
perform
real-world
hearing
tasks
simulated
cochlear
input,
assessing
precision
nerve
spike
needed
reproduce
human
behavior.
Models
high-fidelity
locking
exhibited
more
human-like
localization
and
speech
perception
than
without,
consistent
an
essential
role
hearing.
However,
temporal
behavior
varied
across
tasks,
as
did
that
benefited
task
performance.
These
effects
suggest
perceptual
domains
incorporate
different
extents
depending
on
demands
The
results
illustrate
how
optimizing
for
realistic
can
clarify
candidate
neural
codes
perception.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 13, 2025
Abstract
With
the
rapid
development
of
Artificial
Neural
Network
based
visual
models,
many
studies
have
shown
that
these
models
show
unprecedented
potence
in
predicting
neural
responses
to
images
cortex.
Lately,
advances
computer
vision
introduced
self-supervised
where
a
model
is
trained
using
supervision
from
natural
properties
training
set.
This
has
led
examination
their
prediction
performance,
which
revealed
better
than
supervised
for
with
language
or
image-only
supervision.
In
this
work,
we
delve
deeper
into
models’
ability
explain
representations
object
categories.
We
compare
differed
objectives
examine
they
diverge
predict
fMRI
and
MEG
recordings
while
participants
are
presented
different
Results
both
self-supervision
was
advantageous
comparison
classification
training.
addition,
predictor
later
stages
perception,
shows
consistent
advantage
over
longer
duration,
beginning
80ms
after
exposure.
Examination
effect
data
size
large
dataset
did
not
necessarily
improve
predictions,
particular
models.
Finally,
correspondence
hierarchy
each
cortex
showed
image
only
conclude
consistently
recordings,
type
reveals
property
activity,
language-supervision
explaining
onsets,
explains
long
very
early
latencies
response,
naturally
sharing
corresponding
hierarchical
structure
as
brain.
Proceedings of the National Academy of Sciences,
Journal Year:
2025,
Volume and Issue:
122(18)
Published: April 28, 2025
Efficient
behavior
is
supported
by
humans’
ability
to
rapidly
recognize
acoustically
distinct
sounds
as
members
of
a
common
category.
Within
the
auditory
cortex,
critical
unanswered
questions
remain
regarding
organization
and
dynamics
sound
categorization.
We
performed
intracerebral
recordings
during
epilepsy
surgery
evaluation
20
patient-participants
listened
natural
sounds.
then
built
encoding
models
predict
neural
responses
using
representations
extracted
from
different
layers
within
deep
network
(DNN)
pretrained
categorize
acoustics.
This
approach
yielded
accurate
throughout
cortex.
The
complexity
cortical
site’s
representation
(measured
depth
DNN
layer
that
produced
best
model)
was
closely
related
its
anatomical
location,
with
shallow,
middle,
associated
core
(primary
cortex),
lateral
belt,
parabelt
regions,
respectively.
Smoothly
varying
gradients
representational
existed
these
increasing
along
posteromedial-to-anterolateral
direction
in
belt
posterior-to-anterior
dorsal-to-ventral
dimensions
parabelt.
characterized
time
(relative
onset)
when
feature
emerged;
this
measure
temporal
increased
across
hierarchy.
Finally,
we
found
separable
effects
region
on
complexity:
sites
took
longer
begin
stimulus
features
had
higher
independent
region,
downstream
regions
encoded
more
complex
dynamics.
These
findings
suggest
hierarchies
timescales
represent
functional
organizational
principle
stream
underlying
our
Nature Communications,
Journal Year:
2024,
Volume and Issue:
15(1)
Published: Dec. 4, 2024
Abstract
Neurons
encode
information
in
the
timing
of
their
spikes
addition
to
firing
rates.
Spike
is
particularly
precise
auditory
nerve,
where
action
potentials
phase
lock
sound
with
sub-millisecond
precision,
but
its
behavioral
relevance
remains
uncertain.
We
optimized
machine
learning
models
perform
real-world
hearing
tasks
simulated
cochlear
input,
assessing
precision
nerve
spike
needed
reproduce
human
behavior.
Models
high-fidelity
locking
exhibited
more
human-like
localization
and
speech
perception
than
without,
consistent
an
essential
role
hearing.
However,
temporal
behavior
varied
across
tasks,
as
did
that
benefited
task
performance.
These
effects
suggest
perceptual
domains
incorporate
different
extents
depending
on
demands
The
results
illustrate
how
optimizing
for
realistic
can
clarify
candidate
neural
codes
perception.