Trends in Hearing,
Journal Year:
2024,
Volume and Issue:
28
Published: Jan. 1, 2024
Decoding
speech
envelopes
from
electroencephalogram
(EEG)
signals
holds
potential
as
a
research
tool
for
objectively
assessing
auditory
processing,
which
could
contribute
to
future
developments
in
hearing
loss
diagnosis.
However,
current
methods
struggle
meet
both
high
accuracy
and
interpretability.
We
propose
deep
learning
model
called
the
decoding
transformer
(ADT)
network
envelope
reconstruction
EEG
address
these
issues.
The
ADT
uses
spatio-temporal
convolution
feature
extraction,
followed
by
decoder
decode
envelopes.
Through
anticausal
masking,
considers
only
features
match
natural
relationship
of
EEG.
Performance
evaluation
shows
that
achieves
average
scores
0.168
0.167
on
SparrKULee
DTU
datasets,
respectively,
rivaling
those
other
nonlinear
models.
Furthermore,
visualizing
weights
layer
time-domain
filters
brain
topographies,
combined
with
an
ablation
study
temporal
kernels,
we
analyze
behavioral
patterns
results
indicate
low-
(0.5-8
Hz)
high-frequency
(14-32
are
more
critical
active
regions
primarily
distributed
bilaterally
cortex,
consistent
previous
research.
Visualization
attention
further
validated
In
summary,
balances
performance
interpretability,
making
it
promising
studying
neural
tracking.
Journal of Cognitive Neuroscience,
Journal Year:
2024,
Volume and Issue:
36(3), P. 475 - 491
Published: Jan. 1, 2024
Abstract
Most
parts
of
speech
are
voiced,
exhibiting
a
degree
periodicity
with
fundamental
frequency
and
many
higher
harmonics.
Some
neural
populations
respond
to
this
temporal
fine
structure,
in
particular
at
the
frequency.
This
frequency-following
response
consists
both
subcortical
cortical
contributions
can
be
measured
through
EEG
as
well
magnetoencephalography
(MEG),
although
differ
aspects
activity
that
they
capture:
is
sensitive
radial
tangential
sources
deep
sources,
whereas
MEG
more
restrained
measurement
superficial
activity.
responses
continuous
have
shown
an
early
contribution,
latency
around
9
msec,
agreement
measurements
short
tokens,
not
yet
revealed
such
component.
Here,
we
analyze
long
segments
speech.
We
find
latencies
4–11
followed
by
later
right-lateralized
activities
delays
20–58
msec
potential
activities.
Our
results
show
component
FFR
from
participants
its
agrees
EEG.
They
furthermore
temporally
separated
contributions,
enabling
independent
assessment
components
toward
further
processing.
Trends in Hearing,
Journal Year:
2024,
Volume and Issue:
28
Published: Jan. 1, 2024
The
auditory
brainstem
response
(ABR)
is
a
valuable
clinical
tool
for
objective
hearing
assessment,
which
conventionally
detected
by
averaging
neural
responses
to
thousands
of
short
stimuli.
Progressing
beyond
these
unnatural
stimuli,
continuous
speech
presented
via
earphones
have
been
recently
using
linear
temporal
functions
(TRFs).
Here,
we
extend
earlier
studies
measuring
subcortical
in
the
sound-field,
and
assess
amount
data
needed
estimate
TRFs.
Electroencephalography
(EEG)
was
recorded
from
24
normal
participants
while
they
listened
clicks
stories
loudspeakers.
Subcortical
TRFs
were
computed
after
accounting
non-linear
processing
periphery
either
stimulus
rectification
or
an
nerve
model.
Our
results
demonstrated
that
could
be
reliably
measured
sound-field.
estimated
models
outperformed
simple
rectification,
16
minutes
sufficient
all
show
clear
wave
V
peaks
both
sound-field
highly
consistent
earphone
conditions,
with
click
ABRs.
However,
required
slightly
more
(16
minutes)
achieve
compared
(12
minutes),
possibly
due
effects
room
acoustics.
By
investigating
this
study
lays
groundwork
bringing
assessment
closer
real-life
may
lead
improved
evaluations
smart
technologies.
Frontiers in Neuroscience,
Journal Year:
2022,
Volume and Issue:
16
Published: July 22, 2022
Spoken
language
comprehension
requires
rapid
and
continuous
integration
of
information,
from
lower-level
acoustic
to
higher-level
linguistic
features.
Much
this
processing
occurs
in
the
cerebral
cortex.
Its
neural
activity
exhibits,
for
instance,
correlates
predictive
processing,
emerging
at
delays
a
few
100
ms.
However,
auditory
pathways
are
also
characterized
by
extensive
feedback
loops
cortical
areas
ones
as
well
subcortical
structures.
Early
can
therefore
be
influenced
cognitive
processes,
but
it
remains
unclear
whether
such
contributes
processing.
Here,
we
investigated
early
speech-evoked
that
emerges
fundamental
frequency.
We
analyzed
EEG
recordings
obtained
when
subjects
listened
story
read
single
speaker.
identified
response
tracking
speaker's
frequency
occurred
delay
11
ms,
while
another
elicited
high-frequency
modulation
envelope
higher
harmonics
exhibited
larger
magnitude
longer
latency
about
18
ms
with
an
additional
significant
component
around
40
Notably,
earlier
components
likely
originate
structures,
latter
presumably
involves
contributions
regions.
Subsequently,
determined
these
responses
each
individual
word
story.
then
quantified
context-independent
used
model
compute
context-dependent
surprisal
precision.
The
represented
how
predictable
is,
given
previous
context,
precision
reflected
confidence
predicting
next
past
context.
found
word-level
were
predominantly
features:
average
its
variability.
Amongst
features,
only
showed
weak
modulation.
Our
results
show
is
already
suggesting
top-down
response.
Journal of Neuroscience,
Journal Year:
2023,
Volume and Issue:
43(44), P. 7429 - 7440
Published: Oct. 4, 2023
Selective
attention
to
one
of
several
competing
speakers
is
required
for
comprehending
a
target
speaker
among
other
voices
and
successful
communication
with
them.
It
moreover
has
been
found
involve
the
neural
tracking
low-frequency
speech
rhythms
in
auditory
cortex.
Effects
selective
have
also
subcortical
activities,
particular
regarding
frequency-following
response
related
fundamental
frequency
(speech-FFR).
Recent
investigations
have,
however,
shown
that
speech-FFR
contains
cortical
contributions
as
well.
remains
unclear
whether
these
are
modulated
by
attention.
Here
we
used
magnetoencephalography
assess
attentional
modulation
speech-FFR.
We
presented
both
male
female
participants
two
signals
analyzed
responses
during
switching
between
speakers.
Our
findings
revealed
robust
contribution
speech-FFR:
were
higher
when
was
attended
than
they
ignored.
that,
regardless
attention,
voice
lower
elicited
larger
frequency.
results
show
does
not
only
occur
subcortically
but
extends
cortex
IEEE Open Journal of Signal Processing,
Journal Year:
2024,
Volume and Issue:
5, P. 652 - 661
Published: Jan. 1, 2024
This
paper
describes
the
auditory
EEG
challenge,
organized
as
one
of
Signal
Processing
Grand
Challenges
at
ICASSP
2023.
The
challenge
provides
recordings
85
subjects
who
listened
to
continuous
speech,
audiobooks
or
podcasts,
while
their
brain
activity
was
recorded.
71
were
provided
a
training
set
such
that
participants
could
train
models
on
relatively
large
dataset.
remaining
14
used
held-out
in
evaluating
challenge.
consists
two
tasks
relate
electroencephalogram
(EEG)
signals
presented
speech
stimulus.
first
task,
match-mismatch,
aims
determine
which
segments
induced
given
segment.
In
second
regression
goal
is
reconstruct
envelope
from
EEG.
For
match-mismatch
performance
different
teams
close
baseline
model,
and
did
generalize
well
unseen
subjects.
contrast,
top
significantly
improved
over
stories
test
failing
Frontiers in Neuroscience,
Journal Year:
2022,
Volume and Issue:
16
Published: Aug. 8, 2022
Voice
pitch
carries
linguistic
and
non-linguistic
information.
Previous
studies
have
described
cortical
tracking
of
voice
in
clean
speech,
with
responses
reflecting
both
strength
value.
However,
is
also
a
powerful
cue
for
auditory
stream
segregation,
especially
when
competing
streams
differing
fundamental
frequency,
as
the
case
multiple
speakers
talk
simultaneously.
We
therefore
investigated
how
speech
affected
presence
second,
task-irrelevant
speaker.
analyzed
human
magnetoencephalography
(MEG)
to
continuous
narrative
presented
either
single
talker
quiet
background
or
two-talker
mixture
male
female
In
was
associated
right-dominant
response,
peaking
at
latency
around
100
ms,
consistent
previous
electroencephalography
electrocorticography
results.
The
response
tracked
relative
value
speaker’s
frequency.
mixture,
attended
speaker
bilaterally,
regardless
whether
not
there
simultaneously
present
irrelevant
Pitch
reduced:
only
right
hemisphere
still
significantly
unattended
speaker,
during
intervals
which
no
talker’s
speech.
Taken
together,
these
results
suggest
that
pitch-based
segregation
speakers,
least
measured
by
macroscopic
tracking,
entirely
automatic
but
strongly
dependent
on
selective
attention.
IEEE Open Journal of Signal Processing,
Journal Year:
2024,
Volume and Issue:
5, P. 700 - 716
Published: Jan. 1, 2024
The
electroencephalogram
(EEG)
offers
a
non-invasive
means
by
which
listener's
auditory
system
may
be
monitored
during
continuous
speech
perception.
Reliable
auditory-EEG
decoders
could
facilitate
the
objective
diagnosis
of
hearing
disorders,
or
find
applications
in
cognitively-steered
aids.
Previously,
we
developed
for
ICASSP
Auditory
EEG
Signal
Processing
Grand
Challenge
(SPGC).
These
aimed
to
solve
match-mismatch
task:
given
short
temporal
segment
recordings,
and
two
candidate
segments,
task
is
identify
segments
temporally
aligned,
matched,
with
segment.
made
use
cortical
responses
envelope,
as
well
speech-related
frequency-following
responses,
relate
recordings
stimuli.
Here
comprehensively
document
methods
were
developed.
We
extend
our
previous
analysis
exploring
association
between
speaker
characteristics
(pitch
sex)
classification
accuracy,
provide
full
statistical
final
performance
evaluated
on
heldout
portion
dataset.
Finally,
generalisation
capabilities
are
characterised,
evaluating
them
using
an
entirely
different
dataset
contains
recorded
under
variety
speech-listening
conditions.
results
show
that
achieve
accurate
robust
accuracies,
they
can
even
serve
attention
without
additional
training.
Frontiers in Neuroscience,
Journal Year:
2023,
Volume and Issue:
17
Published: Dec. 14, 2023
Auditory
cortical
responses
to
speech
obtained
by
magnetoencephalography
(MEG)
show
robust
tracking
the
speaker's
fundamental
frequency
in
high-gamma
band
(70-200
Hz),
but
little
is
currently
known
about
whether
such
depend
on
focus
of
selective
attention.
In
this
study
22
human
subjects
listened
concurrent,
fixed-rate,
from
male
and
female
speakers,
were
asked
selectively
attend
one
speaker
at
a
time,
while
their
neural
recorded
with
MEG.
The
pitch
range
coincided
lower
band,
whereas
higher
had
much
less
overlap,
only
upper
end
band.
Neural
analyzed
using
temporal
response
function
(TRF)
framework.
As
expected,
demonstrate
male's
speech,
peak
latency
~40
ms.
Critically,
magnitude
depends
attention:
significantly
greater
when
attended
than
it
not
attended,
under
acoustically
identical
conditions.
This
clear
demonstration
that
even
very
early
auditory
are
influenced
top-down,
cognitive,
processing
mechanisms.
Frontiers in Neuroscience,
Journal Year:
2021,
Volume and Issue:
15
Published: Dec. 21, 2021
Linearized
encoding
models
are
increasingly
employed
to
model
cortical
responses
running
speech.
Recent
extensions
subcortical
suggest
clinical
perspectives,
potentially
complementing
auditory
brainstem
(ABRs)
or
frequency-following
(FFRs)
that
current
standards.
However,
while
it
is
well-known
the
responds
both
transient
amplitude
variations
and
stimulus
periodicity
gives
rise
pitch,
these
features
co-vary
in
Here,
we
discuss
challenges
disentangling
drive
response
Cortical
electroencephalographic
(EEG)
speech
from
19
normal-hearing
listeners
(12
female)
were
analyzed.
Using
forward
regression
models,
confirm
rectified
broadband
signal
yield
temporal
functions
consistent
with
wave
V
of
ABR,
as
shown
previous
work.
Peak
latency
speech-evoked
correlated
standard
click-evoked
ABRs
recorded
at
vertex
electrode
(Cz).
Similar
could
be
obtained
using
fundamental
frequency
(F0)
predictor.
simulations
indicated
dissociating
fine
structure
F0
not
possible
given
high
co-variance
poor
signal-to-noise
ratio
(SNR)
EEG
responses.
In
cortex,
data
replicated
findings
indicating
envelope
tracking
on
frontal
electrodes
can
dissociated
slow
(relative
pitch).
Yet,
no
association
between
F0-tracking
relative
pitch
detected.
These
results
indicate
comparable
ABRs,
pitch-related
processing
may
challenging
natural
stimuli.