Analysis of Deep Learning Models for Voice Pathology Detection DOI

Adham Ahmed Said,

Ahmad Khaled Mohammed,

Mohammad Essam

et al.

Published: Nov. 21, 2023

Voice disorders affect a significant portion of the global population, particularly those in vocally demanding professions such as singers, actors, teachers, and lawyers. Early detection diagnosis voice pathology diseases are critical to improving treatment outcomes preventing further damage vocal cords. Digital processing speech signals has emerged promising technique for analyzing vibrations identifying deformities cord function. In this paper, cost-effective computational method involves signal by passing stack band-pass filters, dividing processed each filter set overlapped frames, applying autocorrelation formula every single frame, using entropy extract features. The shown promise reliably detecting classifying diseases, but research is required confirm its efficacy reliability. Deep learning algorithms Mel spectrogram feature extraction techniques present paper detection. VGG16, VGG19, ResNet50 compared. system demonstrated high prediction accuracy results on training testing dataset. shows potential clinical applications disorder assessment diagnosis. also holds telemedicine tool, enabling remote monitoring patients' health.

Language: Английский

A systematic review of intermediate fusion in multimodal deep learning for biomedical applications DOI Creative Commons
Valerio Guarrasi, Fatih Aksu, Camillo Maria Caruso

et al.

Image and Vision Computing, Journal Year: 2025, Volume and Issue: unknown, P. 105509 - 105509

Published: March 1, 2025

Language: Английский

Citations

3

A Comprehensive Survey on Deep Learning Multi-Modal Fusion: Methods, Technologies and Applications DOI Open Access

Tianzhe Jiao,

Chaopeng Guo, Xiaoyue Feng

et al.

Computers, materials & continua/Computers, materials & continua (Print), Journal Year: 2024, Volume and Issue: 80(1), P. 1 - 35

Published: Jan. 1, 2024

Multi-modal fusion technology gradually become a fundamental task in many fields, such as autonomous driving, smart healthcare, sentiment analysis, and human-computer interaction. It is rapidly becoming the dominant research due to its powerful perception judgment capabilities. Under complex scenes, multi-modal utilizes complementary characteristics of multiple data streams fuse different types achieve more accurate predictions. However, achieving outstanding performance challenging because equipment limitations, missing information, noise. This paper comprehensively reviews existing methods based on techniques completes detailed in-depth analysis. According stage, has four primary methods: early fusion, deep late hybrid fusion. The surveys three major technologies that can significantly enhance effect further explore applications various fields. Finally, it discusses challenges explores potential opportunities. tasks still need intensive study heterogeneity quality. Preserving information eliminating redundant between modalities critical technology. Invalid may introduce extra noise lead worse results. provides comprehensive summary response these challenges.

Language: Английский

Citations

9

Towards an explainable Artificial intelligence system for voice pathology identification and post-treatment characterisation DOI Creative Commons
Federico Calà, Lorenzo Frassineti, Giovanna Cantarella

et al.

Biomedical Signal Processing and Control, Journal Year: 2025, Volume and Issue: 104, P. 107530 - 107530

Published: Jan. 26, 2025

Language: Английский

Citations

1

A deep cross-modal neural cognitive diagnosis framework for modeling student performance DOI
Lingyun Song, Mengting He, Xuequn Shang

et al.

Expert Systems with Applications, Journal Year: 2023, Volume and Issue: 230, P. 120675 - 120675

Published: June 3, 2023

Language: Английский

Citations

16

Pathological Voice Classification Using MEEL Features and SVM-Tabnet Model DOI
Mohammed Zakariah, Muna Al‐Razgan, Taha Alfakih

et al.

Speech Communication, Journal Year: 2024, Volume and Issue: 162, P. 103100 - 103100

Published: July 1, 2024

Language: Английский

Citations

4

Evaluation of phone posterior probabilities for pathology detection in speech data using deep learning models DOI

Sahar Farazi,

Yasser Shekofteh

International Journal of Speech Technology, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 4, 2025

Language: Английский

Citations

0

Enhanced fish stress classification using a cross-modal sensing fusion system with residual depth-separable convolutional networks DOI
Wentao Huang, Yunpeng Wang, Wenhao He

et al.

Computers and Electronics in Agriculture, Journal Year: 2025, Volume and Issue: 231, P. 110038 - 110038

Published: Feb. 8, 2025

Language: Английский

Citations

0

A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications DOI
Valerio Guarrasi, Fatih Aksu, Camillo Maria Caruso

et al.

Published: Jan. 1, 2024

Language: Английский

Citations

3

Pathological voice classification system based on CNN-BiLSTM network using speech enhancement and multi-stream approach DOI
Soumeya Belabbas,

Djamel Addou,

Sid‐Ahmed Selouani

et al.

International Journal of Speech Technology, Journal Year: 2024, Volume and Issue: 27(2), P. 483 - 502

Published: June 1, 2024

Language: Английский

Citations

2

Voice pathology detection using optimized convolutional neural networks and explainable artificial intelligence-based analysis DOI
Roohum Jegan,

R. Jayagowri

Computer Methods in Biomechanics & Biomedical Engineering, Journal Year: 2023, Volume and Issue: 27(14), P. 2041 - 2057

Published: Oct. 18, 2023

AbstractThis article proposes a noninvasive computer-aided assessment approach based on optimized convolutional neural network for healthy and pathological voice detection. Firstly, the input samples are first transformed into mel-spectrogram time-frequency visual representations fed training CNN model. The image captures inherent speech variations beneficial sample weights biases of trained further using artificial bee colony (ABC) optimization algorithm resulting in optimum employed testing unseen data. proposed is evaluated three popular publicly available datasets: SVD, AVPD VOICED. Experimental results emphasize that ABC model shows improved accuracy performance by 1.02% compared to conventional illustrating data-independent discriminative representation ability. Finally, gradient-weighted class activation mapping (Grad-CAM) explainable intelligence (XAI) utilized make decision understandable.Keywords: Voice pathology detectionoptimized CNNexplainable intelligencemel-spectrogramimage texture features AcknowledgmentWe would like thank authors SVD Barry WJ (Barry Citation2007), (Mesallam et al. Citation2017) VOICED (Cesari Citation2018) providing database. Also, we anonymous reviewers their valuable comments/suggestions.Disclosure statementThe declare they have no known competing financial interests or personal relationships could appeared influence work reported this paper.Additional informationFundingThe author(s) there funding associated with featured article.

Language: Английский

Citations

4