A Study on Model Training Strategies for Speaker-Independent and Vocabulary-Mismatched Dysarthric Speech Recognition
Jinzi Qi,
No information about this author
Hugo Van hamme
No information about this author
Applied Sciences,
Journal Year:
2025,
Volume and Issue:
15(4), P. 2006 - 2006
Published: Feb. 14, 2025
Automatic
speech
recognition
(ASR)
systems
often
struggle
to
recognize
from
individuals
with
dysarthria,
a
disorder
neuromuscular
causes,
accuracy
declining
further
for
unseen
speakers
and
content.
Achieving
robustness
such
situations
requires
ASR
address
speaker-independent
vocabulary-mismatched
scenarios,
minimizing
user
adaptation
effort.
This
study
focuses
on
comprehensive
training
strategies
methods
tackle
these
challenges,
leveraging
the
transformer-based
Wav2Vec2.0
model.
Unlike
prior
research,
which
limited
datasets,
we
systematically
explore
data
selection
across
diverse
source
types
(languages,
canonical
vs.
dysarthric,
generic
in-domain)
in
setting.
For
under-explored
evaluate
conventional
methods,
identify
their
limitations,
propose
solution
that
uses
phonological
features
as
intermediate
representations
phone
gaps.
Experimental
results
demonstrate
this
approach
enhances
dysarthric
datasets
both
settings.
By
integrating
advanced
transfer
learning
techniques
innovative
use
of
features,
addresses
key
challenges
recognition,
setting
new
benchmark
adaptability
field.
Language: Английский
Deep Learning Based Speech Recognition for Hyperkinetic Dysarthria Disorder
Antor Mahamudul Hashan,
No information about this author
Chaganov Roman Dmitrievich,
No information about this author
Melnikov Alexander Valerievich
No information about this author
et al.
Published: May 13, 2024
Language: Английский
Gammatonegram representation for end-to-end dysarthric speech processing tasks: speech recognition, speaker identification, and intelligibility assessment
Iran Journal of Computer Science,
Journal Year:
2024,
Volume and Issue:
7(2), P. 311 - 324
Published: March 10, 2024
Language: Английский
Assessing Speech Intelligibility and Severity Level in Parkinson's Disease Using Wav2Vec 2.0
Tomas Smolik,
No information about this author
Radim Krupička,
No information about this author
Ondřej Klempíř
No information about this author
et al.
Published: July 10, 2024
Language: Английский
A Study on The Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment
Xavier F. Cadet,
No information about this author
Ranya Aloufi,
No information about this author
Sara Ahmadi‐Abhari
No information about this author
et al.
Published: April 14, 2024
Language: Английский
Adversarial Auto-Encoders Based Model for Classification of Speech Dysarthria
V. Kanchana Devi,
No information about this author
R.S. Sreenivas,
No information about this author
E. Umamaheshwari
No information about this author
et al.
2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT),
Journal Year:
2024,
Volume and Issue:
unknown, P. 1 - 7
Published: June 24, 2024
Language: Английский
PB-LRDWWS System For the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge
2022 IEEE Spoken Language Technology Workshop (SLT),
Journal Year:
2024,
Volume and Issue:
unknown, P. 586 - 591
Published: Dec. 2, 2024
Language: Английский
Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment
arXiv (Cornell University),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Jan. 1, 2023
Dysarthria
is
a
disability
that
causes
disturbance
in
the
human
speech
system
and
reduces
quality
intelligibility
of
person's
speech.
Because
this
effect,
normal
processing
systems
can
not
work
properly
on
impaired
This
usually
associated
with
physical
disabilities.
Therefore,
designing
perform
some
tasks
by
receiving
voice
commands
smart
home
be
significant
achievement.
In
work,
we
introduce
gammatonegram
as
an
effective
method
to
represent
audio
files
discriminative
details,
which
used
input
for
convolutional
neural
network.
On
other
word,
convert
each
file
into
image
propose
recognition
classify
different
scenarios.
Proposed
CNN
based
transfer
learning
pre-trained
Alexnet.
research,
efficiency
proposed
recognition,
speaker
identification,
assessment
evaluated.
According
results
UA
dataset,
achieved
91.29%
accuracy
speaker-dependent
mode,
identification
acquired
87.74%
text-dependent
96.47%
two-class
mode.
Finally,
multi-network
works
fully
automatically.
located
cascade
arrangement
system,
output
activates
one
networks.
architecture
achieves
92.3%
WRR.
The
source
code
paper
available.
Language: Английский