bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Aug. 24, 2023
Seeing
the
speaker's
face
greatly
improves
our
speech
comprehension
in
noisy
environments.
This
is
due
to
brain's
ability
combine
auditory
and
visual
information
around
us,
a
process
known
as
multisensory
integration.
Selective
attention
also
strongly
influences
what
we
comprehend
scenarios
with
multiple
speakers
-
an
effect
cocktail-party
phenomenon.
However,
interaction
between
integration
not
fully
understood,
especially
when
it
comes
natural,
continuous
speech.
In
recent
electroencephalography
(EEG)
study,
explored
this
issue
showed
that
enhanced
audiovisual
speaker
attended
compared
unattended.
Here,
extend
work
investigate
how
varies
depending
on
person's
gaze
behavior,
which
affects
quality
of
they
have
access
to.
To
do
so,
recorded
EEG
from
31
healthy
adults
performed
selective
tasks
several
paradigms
involving
two
concurrently
presented
speakers.
We
then
modeled
related
audio
(envelope)
Crucially,
classes
model
one
assumed
underlying
(AV)
versus
another
independent
unisensory
processes
(A+V).
comparison
revealed
evidence
strong
attentional
effects
participants
were
looking
directly
at
speaker.
was
apparent
peripheral
vision
participants.
Overall,
findings
suggest
influence
high
fidelity
(articulatory)
available.
More
generally,
suggests
interplay
during
natural
dynamic
adaptable
based
specific
task
environment.
Journal of Neuroscience,
Journal Year:
2024,
Volume and Issue:
44(10), P. e0870232023 - e0870232023
Published: Jan. 10, 2024
During
communication
in
real-life
settings,
our
brain
often
needs
to
integrate
auditory
and
visual
information
at
the
same
time
actively
focus
on
relevant
sources
of
information,
while
ignoring
interference
from
irrelevant
events.
The
interaction
between
integration
attention
processes
remains
poorly
understood.
Here,
we
use
rapid
invisible
frequency
tagging
magnetoencephalography
investigate
how
affects
processing
integration,
during
multimodal
communication.
We
presented
human
participants
(male
female)
with
videos
an
actress
uttering
action
verbs
(auditory;
tagged
58
Hz)
accompanied
by
two
movie
clips
hand
gestures
both
sides
fixation
(attended
stimulus
65
Hz;
unattended
63
Hz).
Integration
difficulty
was
manipulated
a
lower-order
factor
(clear/degraded
speech)
higher-order
semantic
(matching/mismatching
gesture).
observed
enhanced
neural
response
attended
degraded
speech
compared
clear
speech.
For
mismatching
matching
gestures.
Furthermore,
signal
power
intermodulation
frequencies
tags,
indexing
nonlinear
interactions,
left
frontotemporal
frontal
regions.
Focusing
inferior
gyrus,
this
enhancement
specific
for
those
trials
that
benefitted
gesture.
Together,
results
suggest
modulates
audiovisual
interaction,
depending
congruence
quality
sensory
input.
Trends in Hearing,
Journal Year:
2025,
Volume and Issue:
29
Published: April 1, 2025
Speech
intelligibility
in
challenging
listening
environments
relies
on
the
integration
of
audiovisual
cues.
Measuring
effectiveness
these
can
be
difficult
due
to
complexity
such
environments.
The
Audiovisual
True-to-Life
Assessment
Auditory
Rehabilitation
(AVATAR)
is
a
paradigm
that
was
developed
provide
an
ecological
environment
capture
both
audio
and
visual
aspects
speech
measures.
Previous
research
has
shown
benefit
from
cues
measured
using
behavioral
(e.g.,
word
recognition)
electrophysiological
neural
tracking)
current
examines,
when
AVATAR
paradigm,
if
measures
yield
similar
outcomes
as
We
hypothesized
would
enhance
scores
signal-to-noise
ratio
(SNR)
signal
decreased.
Twenty
young
(18-25
years
old)
participants
(1
male
19
female)
with
normal
hearing
participated
our
study.
For
experiment,
we
administered
lists
sentences
adaptive
procedure
estimate
reception
threshold
(SRT).
35
randomized
across
five
SNR
levels
(silence,
0,
-3,
-6,
-9
dB)
two
conditions
(audio-only
audiovisual).
used
tracking
decoder
measure
reconstruction
accuracies
for
each
participant.
observed
most
had
higher
condition
compared
audio-only
moderate
high
noise.
found
may
correlate
shows
benefit.
Scientific Reports,
Journal Year:
2025,
Volume and Issue:
15(1)
Published: April 15, 2025
Understanding
speech
in
noise
can
be
facilitated
by
integrating
auditory
and
visual
cues.
Audiovisual
temporal
acuity,
which
indexed
the
binding
window
(TBW),
is
critical
for
this
process
enhanced
through
simultaneity
judgment
training.
We
hypothesized
that
multisensory
training
would
narrow
TBW
improve
understanding
noise.
Participants
were
randomized
to
receive
either
testing
(n
=
15)
or
testing-only
over
three
days.
Trained
participants
demonstrated
significant
narrowing
their
mean
size
(403ms
345ms;
p
0.030),
whereas
control
did
not
(409ms
474ms;
0.061).
Although
there
no
group-level
changes
word
recognition
scores,
trained
with
larger
decreases
exhibited
improvements
(R2
0.291;
0.038).
Individual
differences
responses
found
related
cortical
processing
using
functional
near-infrared
spectroscopy.
Low
audiovisual-evoked
activity
left
middle
gyrus
0.87;
0.006),
angular
superior
0.85;
cortices
0.74;
0.041)
was
associated
after
Multisensory
transfers
benefits
comprehension
noise,
effect
may
mediated
upregulating
networks
individuals
low
baseline
activity.
PLoS Computational Biology,
Journal Year:
2025,
Volume and Issue:
21(4), P. e1013006 - e1013006
Published: April 28, 2025
In
recent
years,
it
has
become
clear
that
EEG
indexes
the
comprehension
of
natural,
narrative
speech.
One
particularly
compelling
demonstration
this
fact
can
be
seen
by
regressing
responses
to
speech
against
measures
how
individual
words
in
linguistically
relate
their
preceding
context.
This
approach
produces
a
so-called
temporal
response
function
displays
centro-parietal
negativity
reminiscent
classic
N400
component
event-related
potential.
shortcoming
previous
implementations
is
they
have
typically
assumed
linear,
time-invariant
relationship
between
linguistic
features
and
responses.
other
words,
analysis
assumes
same
shape
timing
for
every
word
–
only
varies
(linearly)
terms
its
amplitude.
present
work,
we
relax
assumption
under
hypothesis
may
processed
more
rapidly
when
are
predictable.
Specifically,
introduce
framework
wherein
standard
linear
modulated
amplitude,
latency,
scale
based
on
predictability
current
prior
words.
We
use
proposed
model
recorded
from
set
participants
who
listened
an
audiobook
narrated
single
talker,
separate
attended
one
two
concurrently
presented
audiobooks.
show
expected
faster
evoking
lower
amplitude
N400-like
with
earlier
peaks
effect
driven
both
word’s
own
immediately
word.
Additional
suggests
finding
not
simply
explained
quickly
disambiguated
phonetic
neighbors.
As
such,
our
study
demonstrates
brain
natural
depend
predictability.
By
accounting
these
effects,
also
improves
accuracy
which
neural
modeled.
PLoS ONE,
Journal Year:
2025,
Volume and Issue:
20(5), P. e0320519 - e0320519
Published: May 8, 2025
Music
and
speech
encode
hierarchically
organized
structural
complexity
at
the
service
of
human
expressiveness
communication.
Previous
research
has
shown
that
populations
neurons
in
auditory
regions
track
envelope
acoustic
signals
within
range
slow
fast
oscillatory
activity.
However,
extent
to
which
cortical
tracking
is
influenced
by
interplay
between
stimulus
type,
frequency
band,
brain
anatomy
remains
an
open
question.
In
this
study,
we
reanalyzed
intracranial
recordings
from
thirty
subjects
implanted
with
electrocorticography
(ECoG)
grids
left
cerebral
hemisphere,
drawn
existing
open-access
ECoG
database.
Participants
passively
watched
a
movie
where
visual
scenes
were
accompanied
either
music
or
stimuli.
Cross-correlation
activity
signals,
along
density-based
clustering
analyses
linear
mixed-effects
modeling,
revealed
both
anatomically
overlapping
functionally
distinct
mapping
effect
as
function
type
band.
We
observed
widespread
left-hemisphere
Slow
Frequency
Band
(SFB,
band-passed
filtered
low-frequency
signal
1–8Hz),
near
zero
temporal
lags.
contrast,
High
(HFB,
70–120Hz
signal)
was
higher
during
perception,
more
densely
concentrated
classical
language
processing
areas,
showed
frontal-to-temporal
gradient
lag
values
not
perception
musical
Our
results
highlight
complex
interaction
region
band
shapes
dynamics
naturalistic
signals.
Brain Research Bulletin,
Journal Year:
2023,
Volume and Issue:
205, P. 110817 - 110817
Published: Nov. 19, 2023
Sensory
deprivation
can
offset
the
balance
of
audio
versus
visual
information
in
multimodal
processing.
Such
a
phenomenon
could
persist
for
children
born
deaf,
even
after
they
receive
cochlear
implants
(CIs),
and
potentially
explain
why
one
modality
is
given
priority
over
other.
Here,
we
recorded
cortical
responses
to
single
speaker
uttering
two
syllables,
presented
audio-only
(A),
visual-only
(V),
audio-visual
(AV)
modes.
Electroencephalography
(EEG)
functional
near-infrared
spectroscopy
(fNIRS)
were
successively
seventy-five
school-aged
children.
Twenty-five
with
normal
hearing
(NH)
fifty
wore
CIs,
among
whom
26
had
relatively
high
language
abilities
(HL)
comparable
those
NH
children,
while
24
others
low
(LL).
In
EEG
data,
visual-evoked
potentials
captured
occipital
regions,
response
V
AV
stimuli,
accentuated
HL
group
compared
LL
(the
being
intermediate).
Close
vertex,
auditory-evoked
A
stimuli
reflected
differential
treatment
syllables
but
only
group.
None
metrics
revealed
any
interaction
between
modality.
fNIRS
each
induced
corresponding
activity
or
auditory
no
difference
was
observed
A,
V,
stimulation.
The
present
study
did
not
reveal
sign
abnormal
integration
CI.
An
efficient
integrative
network
(at
least
rudimentary
speech
materials)
clearly
sufficient
condition
exhibit
good
literacy.
Heliyon,
Journal Year:
2024,
Volume and Issue:
10(15), P. e34860 - e34860
Published: July 20, 2024
Face
masks
provide
fundamental
protection
against
the
transmission
of
respiratory
viruses
but
hamper
communication.
We
estimated
auditory
and
visual
obstacles
generated
by
face
on
communication
measuring
neural
tracking
speech.
To
this
end,
we
recorded
EEG
while
participants
were
exposed
to
naturalistic
audio-visual
speech,
embedded
in
5-talker
noise,
three
contexts:
(i)
no-mask
(audio-visual
information
was
fully
available),
(ii)
virtual
mask
(occluded
lips,
intact
audio),
(iii)
real
lips
degraded
audio).
Neural
lip
movements
sound
envelope
speech
measured
through
backward
modeling,
that
is,
reconstructing
stimulus
properties
from
activity.
Behaviorally,
increased
perceived
listening
difficulty
phonological
errors
content
retrieval.
At
level,
observed
occlusion
mouth
abolished
dampened
at
earliest
processing
stages.
By
contrast,
acoustic
related
filtering
altered
later
Finally,
a
consistent
link
emerged
between
increment
drop
reconstruction
performance
when
attending
speaker
wearing
mask.
Results
clearly
dissociated
impact
While
obstacle
hampered
ability
predict
integrate
filter
impacted
stages
typically
associated
with
selective
attention.
The
also
provides
evidence
metacognitive
levels
subtending
face-to-face
Frontiers in Human Neuroscience,
Journal Year:
2023,
Volume and Issue:
17
Published: Dec. 15, 2023
Seeing
the
speaker's
face
greatly
improves
our
speech
comprehension
in
noisy
environments.
This
is
due
to
brain's
ability
combine
auditory
and
visual
information
around
us,
a
process
known
as
multisensory
integration.
Selective
attention
also
strongly
influences
what
we
comprehend
scenarios
with
multiple
speakers-an
effect
cocktail-party
phenomenon.
However,
interaction
between
integration
not
fully
understood,
especially
when
it
comes
natural,
continuous
speech.
In
recent
electroencephalography
(EEG)
study,
explored
this
issue
showed
that
enhanced
an
audiovisual
speaker
attended
compared
unattended.
Here,
extend
work
investigate
how
varies
depending
on
person's
gaze
behavior,
which
affects
quality
of
they
have
access
to.
To
do
so,
recorded
EEG
from
31
healthy
adults
performed
selective
tasks
several
paradigms
involving
two
concurrently
presented
speakers.
We
then
modeled
related
audio
(envelope)
Crucially,
classes
model
-
one
assumed
underlying
(AV)
versus
another
independent
unisensory
processes
(A+V).
comparison
revealed
evidence
strong
attentional
effects
participants
were
looking
directly
at
speaker.
was
apparent
peripheral
vision
participants.
Overall,
findings
suggest
influence
high
fidelity
(articulatory)
available.
More
generally,
suggests
interplay
during
natural
dynamic
adaptable
based
specific
task
environment.
Journal of Neuroscience,
Journal Year:
2023,
Volume and Issue:
43(25), P. 4697 - 4708
Published: May 23, 2023
Previous
work
has
demonstrated
that
performance
in
an
auditory
selective
attention
task
can
be
enhanced
or
impaired,
depending
on
whether
a
task-irrelevant
visual
stimulus
is
temporally
coherent
with
target
stream
competing
distractor.
However,
it
remains
unclear
how
audiovisual
(AV)
temporal
coherence
and
interact
at
the
neurophysiological
level.
Here,
we
measured
neural
activity
using
EEG
while
human
participants
(men
women)
performed
task,
detecting
deviants
audio
stream.
The
amplitude
envelope
of
two
streams
changed
independently,
radius
disk
was
manipulated
to
control
AV
coherence.
Analysis
responses
sound
were
largely
independently
attentional
condition:
both
masker
when
stimulus.
In
contrast,
event-related
response
evoked
by
transient
deviants,
These
results
provide
evidence
for
dissociable
signatures
bottom-up
(coherence)
top-down
(attention)
effects
object
formation.