Auskidtalk: Developing an Orthographic Annotation Workflow for a Speech Corpus of Australian English-Speaking Children DOI
Tuende Szalay, Mostafa Shahin, Tharmakulasingam Sirojan

и другие.

Опубликована: Янв. 1, 2024

Язык: Английский

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech DOI
Kodali Radha, Mohan Bansal, Venkata Rao Dhulipalla

и другие.

Circuits Systems and Signal Processing, Год журнала: 2024, Номер 43(5), С. 3261 - 3278

Опубликована: Фев. 22, 2024

Язык: Английский

Процитировано

4

Explainable AI for CNN-LSTM Network in PCG-Based Valvular Heart Disease Diagnosis DOI

Chelluri Divakar,

R Harsha,

Kodali Radha

и другие.

Опубликована: Янв. 18, 2024

Globally, valvular heart diseases (VHDs) account for a major portion of deaths and illnesses. An accurate timely identification VHDs is essential directing proper treatment enhancing patient outcomes. Phonocardiogram (PCG) signals provide non-invasive affordable means capturing acoustic information about the cardiac cycle, rendering them suitable VHD detection. The proposed method provides an explainable artificial intelligence (XAI) framework PCG-based diagnosis using convolutional neural network (CNN) - long short-term memory (LSTM) (CNN-LSTM) network. leverages strengths deep learning to achieve high diagnostic accuracy while providing interpretability XAI model's predictions. Data augmentation techniques are utilized augment PCG signals. Mel-spectrograms used extract relevant features from model consists CNN architecture layer LSTM making CNN-LSTM architecture. will be 5-class classifier with classes named aortic stenosis, mitral regurgitation, valve prolapse, normal. technique employed gradient-weighted class activation mapping (Grad-CAM), enabling visualization decision-making by generating heatmaps. impressive 97.5% has been achieved model. integration ensures comprehensive interpretation model, transparency potential real-time clinical deployment.

Язык: Английский

Процитировано

4

Automatic speaker and age identification of children from raw speech using sincNet over ERB scale DOI
Kodali Radha, Mohan Bansal, Ram Bilas Pachori

и другие.

Speech Communication, Год журнала: 2024, Номер 159, С. 103069 - 103069

Опубликована: Апрель 1, 2024

Язык: Английский

Процитировано

4

Automatic dysarthria detection and severity level assessment using CWT-layered CNN model DOI Creative Commons

Shaik Sajiha,

Kodali Radha,

Dhulipalla Venkata Rao

и другие.

EURASIP Journal on Audio Speech and Music Processing, Год журнала: 2024, Номер 2024(1)

Опубликована: Июнь 25, 2024

Abstract Dysarthria is a speech disorder that affects the ability to communicate due articulation difficulties. This research proposes novel method for automatic dysarthria detection (ADD) and severity level assessment (ADSLA) by using variable continuous wavelet transform (CWT) layered convolutional neural network (CNN) model. To determine their efficiency, proposed model assessed two distinct corpora, TORGO UA-Speech, comprising both patients healthy subject signals. The study explores effectiveness of CWT-layered CNN models employ different wavelets such as Amor, Morse, Bump. aims analyze models’ performance without need feature extraction, which could provide deeper insights into in processing complex data. Also, raw waveform modeling preserves original signal’s integrity nuance, making it ideal applications like recognition, signal processing, image processing. Extensive analysis experimentation have revealed Amor surpasses Morse Bump accurately representing characteristics. outperforms others terms reconstruction fidelity, noise suppression capabilities, extraction accuracy. emphasizes importance selecting appropriate signal-processing tasks. reliable precise choice applications. UA-Speech dataset crucial more accurate classification. Advanced deep learning techniques can simplify early intervention measures expedite diagnosis process.

Язык: Английский

Процитировано

4

Partial Fake Speech Attacks in the Real World Using Deepfake Audio DOI Creative Commons
Abdulazeez Alali, George Theodorakopoulos

Journal of Cybersecurity and Privacy, Год журнала: 2025, Номер 5(1), С. 6 - 6

Опубликована: Фев. 8, 2025

Advances in deep learning have led to dramatic improvements generative synthetic speech, eliminating robotic speech patterns create that is indistinguishable from a human voice. Although these advances are extremely useful various applications, they also facilitate powerful attacks against both humans and machines. Recently, new type of attack called partial fake (PF) has emerged. This paper studies how well machines, including speaker recognition systems existing fake-speech detection tools, can distinguish between voice computer-generated speech. Our study shows machines be easily deceived by PF the current defences insufficient. These findings emphasise urgency increasing awareness for creating automated

Язык: Английский

Процитировано

0

Speech emotion recognition based on spiking neural network and convolutional neural network DOI

Chengyan Du,

Liu Fu, Bingyi Kang

и другие.

Engineering Applications of Artificial Intelligence, Год журнала: 2025, Номер 147, С. 110314 - 110314

Опубликована: Фев. 22, 2025

Язык: Английский

Процитировано

0

Attention-based multi dimension fused-feature convolutional neural network framework for speaker recognition DOI
V. Karthikeyan, S. Suja Priyadharsini,

K. Balamurugan

и другие.

Multimedia Tools and Applications, Год журнала: 2025, Номер unknown

Опубликована: Март 13, 2025

Язык: Английский

Процитировано

0

Optimizing Speaker Recognition Through Feature Extraction Techniques: A Focus on Gujarati Dialects DOI
Meera M. Shah, Hiren R. Kavathiya

Communications in computer and information science, Год журнала: 2025, Номер unknown, С. 165 - 178

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

0

Raw-Waveform Based Bark Scale Initialized SincNet Model in Child Speaker Identification DOI
Kodali Radha,

Jami Gowtham Kumar,

D. Sanjay

и другие.

Опубликована: Авг. 2, 2024

Язык: Английский

Процитировано

0

Securing Automatic Speaker Verification Systems Using Residual Networks DOI
Nidhi Chakravarty, Mohit Dua

Advances in information security, privacy, and ethics book series, Год журнала: 2024, Номер unknown, С. 107 - 138

Опубликована: Июль 12, 2024

Spoofing attacks are a major risk for automatic speaker verification systems, which becoming more widespread. Adequate countermeasures necessary since like replay, synthetic, and deepfake attacks, difficult to identify. Technologies that can identify audio-level must be developed in order address this issue. In chapter, the authors have proposed combination of different spectrogram-based techniques with Residual Networks34 (ResNet34) securing (ASV) systems. The methodology uses Mel frequency scale-based Mel-spectrogram (MS), gamma gammatone spectrogram (GS), filter bank-based cepstral spectrograms (MCS), acoustic pattern-based pattern (APS), (GCS), short-time Fourier transform-based short (SFS) methods, one by one, at front audio spoof detection system. These individually fed ResNet34 classification backend.

Язык: Английский

Процитировано

0