Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment DOI Creative Commons
Aref Farhadipour, Hadi Veisi

arXiv (Cornell University), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 1, 2023

Dysarthria is a disability that causes disturbance in the human speech system and reduces quality intelligibility of person's speech. Because this effect, normal processing systems can not work properly on impaired This usually associated with physical disabilities. Therefore, designing perform some tasks by receiving voice commands smart home be significant achievement. In work, we introduce gammatonegram as an effective method to represent audio files discriminative details, which used input for convolutional neural network. On other word, convert each file into image propose recognition classify different scenarios. Proposed CNN based transfer learning pre-trained Alexnet. research, efficiency proposed recognition, speaker identification, assessment evaluated. According results UA dataset, achieved 91.29% accuracy speaker-dependent mode, identification acquired 87.74% text-dependent 96.47% two-class mode. Finally, multi-network works fully automatically. located cascade arrangement system, output activates one networks. architecture achieves 92.3% WRR. The source code paper available.

Language: Английский

A Study on Model Training Strategies for Speaker-Independent and Vocabulary-Mismatched Dysarthric Speech Recognition DOI Creative Commons

Jinzi Qi,

Hugo Van hamme

Applied Sciences, Journal Year: 2025, Volume and Issue: 15(4), P. 2006 - 2006

Published: Feb. 14, 2025

Automatic speech recognition (ASR) systems often struggle to recognize from individuals with dysarthria, a disorder neuromuscular causes, accuracy declining further for unseen speakers and content. Achieving robustness such situations requires ASR address speaker-independent vocabulary-mismatched scenarios, minimizing user adaptation effort. This study focuses on comprehensive training strategies methods tackle these challenges, leveraging the transformer-based Wav2Vec2.0 model. Unlike prior research, which limited datasets, we systematically explore data selection across diverse source types (languages, canonical vs. dysarthric, generic in-domain) in setting. For under-explored evaluate conventional methods, identify their limitations, propose solution that uses phonological features as intermediate representations phone gaps. Experimental results demonstrate this approach enhances dysarthric datasets both settings. By integrating advanced transfer learning techniques innovative use of features, addresses key challenges recognition, setting new benchmark adaptability field.

Language: Английский

Citations

0

Deep Learning Based Speech Recognition for Hyperkinetic Dysarthria Disorder DOI
Antor Mahamudul Hashan,

Chaganov Roman Dmitrievich,

Melnikov Alexander Valerievich

et al.

Published: May 13, 2024

Language: Английский

Citations

2

Gammatonegram representation for end-to-end dysarthric speech processing tasks: speech recognition, speaker identification, and intelligibility assessment DOI
Aref Farhadipour, Hadi Veisi

Iran Journal of Computer Science, Journal Year: 2024, Volume and Issue: 7(2), P. 311 - 324

Published: March 10, 2024

Language: Английский

Citations

1

Assessing Speech Intelligibility and Severity Level in Parkinson's Disease Using Wav2Vec 2.0 DOI

Tomas Smolik,

Radim Krupička, Ondřej Klempíř

et al.

Published: July 10, 2024

Language: Английский

Citations

1

A Study on The Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment DOI

Xavier F. Cadet,

Ranya Aloufi,

Sara Ahmadi‐Abhari

et al.

Published: April 14, 2024

Language: Английский

Citations

0

Adversarial Auto-Encoders Based Model for Classification of Speech Dysarthria DOI

V. Kanchana Devi,

R.S. Sreenivas,

E. Umamaheshwari

et al.

2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), Journal Year: 2024, Volume and Issue: unknown, P. 1 - 7

Published: June 24, 2024

Language: Английский

Citations

0

PB-LRDWWS System For the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge DOI
Shiyao Wang, Jiaming Zhou, Shiwan Zhao

et al.

2022 IEEE Spoken Language Technology Workshop (SLT), Journal Year: 2024, Volume and Issue: unknown, P. 586 - 591

Published: Dec. 2, 2024

Language: Английский

Citations

0

Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment DOI Creative Commons
Aref Farhadipour, Hadi Veisi

arXiv (Cornell University), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 1, 2023

Dysarthria is a disability that causes disturbance in the human speech system and reduces quality intelligibility of person's speech. Because this effect, normal processing systems can not work properly on impaired This usually associated with physical disabilities. Therefore, designing perform some tasks by receiving voice commands smart home be significant achievement. In work, we introduce gammatonegram as an effective method to represent audio files discriminative details, which used input for convolutional neural network. On other word, convert each file into image propose recognition classify different scenarios. Proposed CNN based transfer learning pre-trained Alexnet. research, efficiency proposed recognition, speaker identification, assessment evaluated. According results UA dataset, achieved 91.29% accuracy speaker-dependent mode, identification acquired 87.74% text-dependent 96.47% two-class mode. Finally, multi-network works fully automatically. located cascade arrangement system, output activates one networks. architecture achieves 92.3% WRR. The source code paper available.

Language: Английский

Citations

0