Neural
activity
in
auditory
cortex
tracks
the
amplitude
envelope
of
continuous
speech,
but
recent
work
counter-intuitively
suggests
that
neural
tracking
increases
when
speech
is
masked
by
background
noise,
despite
reduced
intelligibility.
Noise-related
amplification
could
indicate
stochastic
resonance
–
response
facilitation
through
noise
supports
tracking.
However,
a
comprehensive
account
sensitivity
to
and
role
cognitive
investment
lacking.
In
five
electroencephalography
(EEG)
experiments
(N=109;
box
sexes),
current
study
demonstrates
generalized
enhancement
due
minimal
noise.
Results
show
a)
enhanced
for
at
very
high
SNRs
(∼30
dB
SNR)
where
highly
intelligible;
b)
this
independent
attention;
c)
it
generalizes
across
different
stationary
maskers,
strongest
12-talker
babble;
d)
present
headphone
free-field
listening,
suggesting
neural-tracking
real-life
listening.
The
paints
clear
picture
enhances
representation
envelope,
contributes
further
highlights
non-linearities
induced
make
its
use
as
biological
marker
processing
challenging.
Text-to-Speech
(TTS)
and
Voice
Conversion
(VC)
models
have
exhibited
remarkable
performance
in
generating
realistic
natural
audio.
However,
their
dark
side,
audio
deepfake
poses
a
significant
threat
to
both
society
individuals.
Existing
countermeasures
largely
focus
on
determining
the
genuineness
of
speech
based
complete
original
recordings,
which
however
often
contain
private
content.
This
oversight
may
refrain
detection
from
many
applications,
particularly
scenarios
involving
sensitive
information
like
business
secrets.
In
this
paper,
we
propose
SafeEar,
novel
framework
that
aims
detect
audios
without
relying
accessing
content
within.
Our
key
idea
is
devise
neural
codec
into
decoupling
model
well
separates
semantic
acoustic
samples,
only
use
(e.g.,
prosody
timbre)
for
detection.
way,
no
will
be
exposed
detector.
To
overcome
challenge
identifying
diverse
clues,
enhance
our
detector
with
real-world
augmentation.
Extensive
experiments
conducted
four
benchmark
datasets
demonstrate
SafeEar's
effectiveness
detecting
various
techniques
an
equal
error
rate
(EER)
down
2.02%.
Simultaneously,
it
shields
five-language
being
deciphered
by
machine
human
auditory
analysis,
demonstrated
word
rates
(WERs)
all
above
93.93%
user
study.
Furthermore,
constructed
anti-deepfake
anti-content
recovery
evaluation
helps
provide
basis
future
research
realms
privacy
preservation
Communications Biology,
Journal Year:
2025,
Volume and Issue:
8(1)
Published: March 1, 2025
The
role
of
early
auditory
experience
in
the
development
neural
speech
tracking
remains
an
open
question.
To
address
this
issue,
we
measured
children
with
or
without
functional
hearing
during
their
first
year
life
after
was
restored
cochlear
implants
(CIs),
as
well
controls
(HC).
Neural
CIs
is
unaffected
by
absence
perinatal
experience.
CI
users
and
HC
exhibit
a
similar
magnitude
at
short
timescales
brain
activity.
However,
delayed
users,
its
timing
depends
on
age
restoration.
Conversely,
longer
timescales,
dampened
participants
using
CIs,
thereby
accounting
for
comprehension
deficits.
These
findings
highlight
resilience
sensory
processing
while
also
demonstrating
vulnerability
higher-level
to
lack
shows
that
phase
loss
affects
differently.
Tracking
present
but
weaker
ones,
impacting
comprehension.
Neural
activity
in
auditory
cortex
tracks
the
amplitude-onset
envelope
of
continuous
speech,
but
recent
work
counter-intuitively
suggests
that
neural
tracking
increases
when
speech
is
masked
by
background
noise,
despite
reduced
intelligibility.
Noise-related
amplification
could
indicate
stochastic
resonance
–
response
facilitation
through
noise
supports
tracking,
a
comprehensive
account
lacking.
In
five
human
electroencephalography
(EEG)
experiments,
current
study
demonstrates
generalized
enhancement
due
to
minimal
noise.
Results
show
a)
enhanced
for
at
very
high
SNRs
(∼30
dB
SNR)
where
highly
intelligible;
b)
this
independent
attention;
c)
it
generalizes
across
different
stationary
maskers,
strongest
12-talker
babble;
and
d)
present
headphone
free-field
listening,
suggesting
neural-tracking
real-life
listening.
The
paints
clear
picture
enhances
representation
onset-envelope,
contributes
tracking.
further
highlights
non-linearities
induced
make
its
use
as
biological
marker
processing
challenging.
Neural
activity
in
auditory
cortex
tracks
the
amplitude-onset
envelope
of
continuous
speech,
but
recent
work
counterintuitively
suggests
that
neural
tracking
increases
when
speech
is
masked
by
background
noise,
despite
reduced
intelligibility.
Noise-related
amplification
could
indicate
stochastic
resonance
–
response
facilitation
through
noise
supports
tracking,
a
comprehensive
account
lacking.
In
five
human
electroencephalography
experiments,
current
study
demonstrates
generalized
enhancement
due
to
minimal
noise.
Results
show
(1)
enhanced
for
at
very
high
signal-to-noise
ratios
(~30
dB
SNR)
where
highly
intelligible;
(2)
this
independent
attention;
(3)
it
generalizes
across
different
stationary
maskers,
strongest
12-talker
babble;
and
(4)
present
headphone
free-field
listening,
suggesting
neural-tracking
real-life
listening.
The
paints
clear
picture
enhances
representation
onset-envelope,
contributes
tracking.
further
highlights
non-linearities
induced
make
its
use
as
biological
marker
processing
challenging.
PLoS Computational Biology,
Journal Year:
2025,
Volume and Issue:
21(4), P. e1013006 - e1013006
Published: April 28, 2025
In
recent
years,
it
has
become
clear
that
EEG
indexes
the
comprehension
of
natural,
narrative
speech.
One
particularly
compelling
demonstration
this
fact
can
be
seen
by
regressing
responses
to
speech
against
measures
how
individual
words
in
linguistically
relate
their
preceding
context.
This
approach
produces
a
so-called
temporal
response
function
displays
centro-parietal
negativity
reminiscent
classic
N400
component
event-related
potential.
shortcoming
previous
implementations
is
they
have
typically
assumed
linear,
time-invariant
relationship
between
linguistic
features
and
responses.
other
words,
analysis
assumes
same
shape
timing
for
every
word
–
only
varies
(linearly)
terms
its
amplitude.
present
work,
we
relax
assumption
under
hypothesis
may
processed
more
rapidly
when
are
predictable.
Specifically,
introduce
framework
wherein
standard
linear
modulated
amplitude,
latency,
scale
based
on
predictability
current
prior
words.
We
use
proposed
model
recorded
from
set
participants
who
listened
an
audiobook
narrated
single
talker,
separate
attended
one
two
concurrently
presented
audiobooks.
show
expected
faster
evoking
lower
amplitude
N400-like
with
earlier
peaks
effect
driven
both
word’s
own
immediately
word.
Additional
suggests
finding
not
simply
explained
quickly
disambiguated
phonetic
neighbors.
As
such,
our
study
demonstrates
brain
natural
depend
predictability.
By
accounting
these
effects,
also
improves
accuracy
which
neural
modeled.
Cerebral Cortex,
Journal Year:
2024,
Volume and Issue:
34(5)
Published: May 1, 2024
Abstract
Speech
comprehension
in
noise
depends
on
complex
interactions
between
peripheral
sensory
and
central
cognitive
systems.
Despite
having
normal
hearing,
older
adults
show
difficulties
speech
comprehension.
It
remains
unclear
whether
the
brain’s
neural
responses
could
indicate
aging.
The
current
study
examined
individual
brain
activation
during
perception
different
listening
environments
predict
age.
We
applied
functional
near-infrared
spectroscopy
to
93
normal-hearing
human
(20
70
years
old)
a
sentence
task,
which
contained
quiet
condition
4
signal-to-noise
ratios
(SNR
=
10,
5,
0,
−5
dB)
noisy
conditions.
A
data-driven
approach,
region-based
brain-age
predictive
modeling
was
adopted.
observed
significant
behavioral
decrease
with
age
under
conditions,
but
not
condition.
Brain
activations
SNR
10
dB
successfully
individual’s
Moreover,
we
found
that
bilateral
visual
cortex,
left
dorsal
pathway,
cerebellum,
right
temporal–parietal
junction
area,
homolog
Wernicke’s
middle
temporal
gyrus
contributed
most
prediction
performance.
These
results
demonstrate
of
regions
about
sensory-motor
mapping
sound,
especially
be
sensitive
measures
for
than
external
behavior
measures.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 12, 2024
Robust
neural
encoding
of
speech
in
noise
is
influenced
by
several
factors,
including
signal-to-noise
ratio
(SNR),
intelligibility
(SI),
and
attentional
effort
(AE).
Yet,
the
interaction
distinct
role
these
factors
remain
unclear.
In
this
study,
fourteen
native
English
speakers
performed
selective
listening
tasks
at
various
SNR
levels
while
EEG
responses
were
recorded.
Attentional
performance
was
assessed
using
a
repeated
word
detection
task,
inferred
from
subjects'
gaze
velocity.
Results
indicate
that
both
SI
enhance
tracking
target
speech,
with
effects
previously
overlooked
effort.
Specifically,
high
SI,
increasing
leads
to
reduced
effort,
which
turn
decreases
tracking.
Our
findings
highlight
importance
differentiating
roles
SNR,
AE
processing
advance
our
understanding
how
noisy
processed
auditory
pathway.
Trends in Hearing,
Journal Year:
2024,
Volume and Issue:
28
Published: Jan. 1, 2024
During
continuous
speech
perception,
endogenous
neural
activity
becomes
time-locked
to
acoustic
stimulus
features,
such
as
the
amplitude
envelope.
This
speech–brain
coupling
can
be
decoded
using
non-invasive
brain
imaging
techniques,
including
electroencephalography
(EEG).
Neural
decoding
may
provide
clinical
use
an
objective
measure
of
encoding
by
brain—for
example
during
cochlear
implant
listening,
wherein
signal
is
severely
spectrally
degraded.
Yet,
interplay
between
and
linguistic
factors
lead
top-down
modulation
thereby
complicating
audiological
applications.
To
address
this
ambiguity,
we
assess
envelope
under
spectral
degradation
with
EEG
in
acoustically
hearing
listeners
(
n
=
38;
18–35
years
old)
vocoded
speech.
We
dissociate
sensory
from
higher-order
processing
employing
intelligible
(English)
non-intelligible
(Dutch)
stimuli,
auditory
attention
sustained
a
repeated-phrase
detection
task.
Subject-specific
group
decoders
were
trained
reconstruct
held-out
data,
decoder
significance
determined
via
random
permutation
testing.
Whereas
reconstruction
did
not
vary
resolution,
was
associated
better
accuracy
general.
Results
similar
across
subject-specific
analyses,
less
consistent
effects
decoding.
Permutation
tests
revealed
possible
differences
statistical
experimental
condition.
In
general,
while
robust
observed
at
individual
level,
variability
within
participants
would
most
likely
prevent
differentiate
levels
intelligibility
on
basis.