Cited by Auskidtalk: Developing an Orthographic Annotation Workflow for a Speech Corpus of Australian English-Speaking Children

Explainable AI for CNN-LSTM Network in PCG-Based Valvular Heart Disease Diagnosis DOI

Chelluri Divakar,

R Harsha,

Kodali Radha

et al.

Published: Jan. 18, 2024

Globally, valvular heart diseases (VHDs) account for a major portion of deaths and illnesses. An accurate timely identification VHDs is essential directing proper treatment enhancing patient outcomes. Phonocardiogram (PCG) signals provide non-invasive affordable means capturing acoustic information about the cardiac cycle, rendering them suitable VHD detection. The proposed method provides an explainable artificial intelligence (XAI) framework PCG-based diagnosis using convolutional neural network (CNN) - long short-term memory (LSTM) (CNN-LSTM) network. leverages strengths deep learning to achieve high diagnostic accuracy while providing interpretability XAI model's predictions. Data augmentation techniques are utilized augment PCG signals. Mel-spectrograms used extract relevant features from model consists CNN architecture layer LSTM making CNN-LSTM architecture. will be 5-class classifier with classes named aortic stenosis, mitral regurgitation, valve prolapse, normal. technique employed gradient-weighted class activation mapping (Grad-CAM), enabling visualization decision-making by generating heatmaps. impressive 97.5% has been achieved model. integration ensures comprehensive interpretation model, transparency potential real-time clinical deployment.

Language: Английский

Citations

Automatic speaker and age identification of children from raw speech using sincNet over ERB scale DOI

Kodali Radha, Mohan Bansal, Ram Bilas Pachori

et al.

Speech Communication, Journal Year: 2024, Volume and Issue: 159, P. 103069 - 103069

Published: April 1, 2024

Language: Английский

Citations

Automatic dysarthria detection and severity level assessment using CWT-layered CNN model DOI

Shaik Sajiha,

Kodali Radha,

Dhulipalla Venkata Rao

et al.

EURASIP Journal on Audio Speech and Music Processing, Journal Year: 2024, Volume and Issue: 2024(1)

Published: June 25, 2024

Abstract Dysarthria is a speech disorder that affects the ability to communicate due articulation difficulties. This research proposes novel method for automatic dysarthria detection (ADD) and severity level assessment (ADSLA) by using variable continuous wavelet transform (CWT) layered convolutional neural network (CNN) model. To determine their efficiency, proposed model assessed two distinct corpora, TORGO UA-Speech, comprising both patients healthy subject signals. The study explores effectiveness of CWT-layered CNN models employ different wavelets such as Amor, Morse, Bump. aims analyze models’ performance without need feature extraction, which could provide deeper insights into in processing complex data. Also, raw waveform modeling preserves original signal’s integrity nuance, making it ideal applications like recognition, signal processing, image processing. Extensive analysis experimentation have revealed Amor surpasses Morse Bump accurately representing characteristics. outperforms others terms reconstruction fidelity, noise suppression capabilities, extraction accuracy. emphasizes importance selecting appropriate signal-processing tasks. reliable precise choice applications. UA-Speech dataset crucial more accurate classification. Advanced deep learning techniques can simplify early intervention measures expedite diagnosis process.

Language: Английский

Citations

Partial Fake Speech Attacks in the Real World Using Deepfake Audio DOI

Abdulazeez Alali, George Theodorakopoulos

Journal of Cybersecurity and Privacy, Journal Year: 2025, Volume and Issue: 5(1), P. 6 - 6

Published: Feb. 8, 2025

Advances in deep learning have led to dramatic improvements generative synthetic speech, eliminating robotic speech patterns create that is indistinguishable from a human voice. Although these advances are extremely useful various applications, they also facilitate powerful attacks against both humans and machines. Recently, new type of attack called partial fake (PF) has emerged. This paper studies how well machines, including speaker recognition systems existing fake-speech detection tools, can distinguish between voice computer-generated speech. Our study shows machines be easily deceived by PF the current defences insufficient. These findings emphasise urgency increasing awareness for creating automated

Language: Английский

Citations

Speech emotion recognition based on spiking neural network and convolutional neural network DOI

Chengyan Du,

Liu Fu, Bingyi Kang

et al.

Engineering Applications of Artificial Intelligence, Journal Year: 2025, Volume and Issue: 147, P. 110314 - 110314

Published: Feb. 22, 2025

Language: Английский

Citations

Attention-based multi dimension fused-feature convolutional neural network framework for speaker recognition DOI

V. Karthikeyan, S. Suja Priyadharsini,

K. Balamurugan

et al.

Multimedia Tools and Applications, Journal Year: 2025, Volume and Issue: unknown

Published: March 13, 2025

Language: Английский

Citations

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech DOI

Kodali Radha, Mohan Bansal, Venkata Rao Dhulipalla

et al.

Circuits Systems and Signal Processing, Journal Year: 2024, Volume and Issue: 43(5), P. 3261 - 3278

Published: Feb. 22, 2024

Language: Английский

Citations

Automated ASD detection in children from raw speech using customized STFT-CNN model DOI

Kurma Venkata Keerthana Sai,

Rompicharla Thanmayee Krishna,

Kodali Radha

et al.

International Journal of Speech Technology, Journal Year: 2024, Volume and Issue: 27(3), P. 701 - 716

Published: July 26, 2024

Language: Английский

Citations

Raw-Waveform Based Bark Scale Initialized SincNet Model in Child Speaker Identification DOI

Kodali Radha,

Jami Gowtham Kumar,

D. Sanjay

et al.

Published: Aug. 2, 2024

Language: Английский

Citations

Securing Automatic Speaker Verification Systems Using Residual Networks DOI

Nidhi Chakravarty, Mohit Dua

Advances in information security, privacy, and ethics book series, Journal Year: 2024, Volume and Issue: unknown, P. 107 - 138

Published: July 12, 2024

Spoofing attacks are a major risk for automatic speaker verification systems, which becoming more widespread. Adequate countermeasures necessary since like replay, synthetic, and deepfake attacks, difficult to identify. Technologies that can identify audio-level must be developed in order address this issue. In chapter, the authors have proposed combination of different spectrogram-based techniques with Residual Networks34 (ResNet34) securing (ASV) systems. The methodology uses Mel frequency scale-based Mel-spectrogram (MS), gamma gammatone spectrogram (GS), filter bank-based cepstral spectrograms (MCS), acoustic pattern-based pattern (APS), (GCS), short-time Fourier transform-based short (SFS) methods, one by one, at front audio spoof detection system. These individually fed ResNet34 classification backend.

Language: Английский

Citations