Speaker identification under noisy conditions using hybrid convolutional neural network and gated recurrent unit DOI Open Access
Wondimu Lambamo, Ramasamy Srinivasagan,

Worku Jifara

et al.

IAES International Journal of Artificial Intelligence, Journal Year: 2023, Volume and Issue: 13(1), P. 1050 - 1050

Published: Dec. 25, 2023

<p><span>Speaker identification is biometrics that classifies or identifies a person from other speakers based on speech characteristics. Recently, deep learning models outperformed conventional machine in speaker identification. Spectrograms of the have been used as input learning-based using clean speech. However, performance systems gets degraded under noisy conditions. Cochleograms shown better results than spectrograms recognition and mismatched Moreover, hybrid convolutional neural network (CNN) recurrent (RNN) variants CNN RNN recent studies. there no attempt conducted to use enhanced cochleogram enhance In this study, gated unit (GRU) proposed for conditions input. VoxCeleb1 audio dataset with real-world noises, white Gaussian noises (WGN) without additive were employed experiments. The experiment comparison existing works show model performs study works</span><span>.</span></p>

Language: Английский

Text-Independent Speaker Identification Using Arabic Phonemes DOI Open Access

Samiha R. Alarjani,

Imran Rao, Iram Fatima

et al.

Journal of Advances in Information Technology, Journal Year: 2025, Volume and Issue: 16(3), P. 330 - 341

Published: Jan. 1, 2025

Language: Английский

Citations

0

Attention-based multi dimension fused-feature convolutional neural network framework for speaker recognition DOI
V. Karthikeyan, S. Suja Priyadharsini,

K. Balamurugan

et al.

Multimedia Tools and Applications, Journal Year: 2025, Volume and Issue: unknown

Published: March 13, 2025

Language: Английский

Citations

0

Industrial-Grade CNN-Based System for the Discrimination of Music Versus Non-Music in Radio Broadcast Audio DOI Creative Commons
Valerio Cesarini,

Vincenzo Addati,

Giovanni Costantini

et al.

Information, Journal Year: 2025, Volume and Issue: 16(4), P. 288 - 288

Published: April 3, 2025

This paper addresses the issue of distinguishing commercially played songs from non-music audio in radio broadcasts, where automatic song identification systems are commonly employed for reporting purposes. Service call costs increase because these need to remain continuously active, even when music is not being broadcast. Our solution serves as a preliminary filter determine whether an segment constitutes “music” and thus warrants subsequent service identifier. We collected 139 h non-consecutive 5 s samples various labeling segments talk shows or advertisements “non-music”. implemented multiple data augmentation strategies, including FM-like pre-processing, trained custom Convolutional Neural Network, then built live inference platform capable monitoring web streams. was validated using 1360 newly samples, evaluating performance on both chunks 15 buffers. The system demonstrated consistently high previously unseen stations, achieving average accuracy 96% maximum 98.23%. intensive pre-processing contributed performances with benefit making inherently suitable FM radio. has been incorporated into commercial product currently utilized by Italian clients royalty calculation

Language: Английский

Citations

0

Speech signal’s phase information based Alzheimer’s disease detection using deep learning DOI
Mahesh Kumar,

Sushant,

Arun Kumar Yadav

et al.

International Journal of Speech Technology, Journal Year: 2025, Volume and Issue: unknown

Published: May 14, 2025

Language: Английский

Citations

0

Design and application of rhodamine derivatives in redox biology: a roadmap of the last decade towards artificial intelligence DOI
Moumita Mondal, Riyanka Das,

Rajeshwari Pal

et al.

Journal of Materials Chemistry A, Journal Year: 2024, Volume and Issue: 12(33), P. 21626 - 21676

Published: Jan. 1, 2024

Reactive sulfur, oxygen and nitrogen species (reactive SON species) are important topics in redox biology their recognition by rhodamine-derived probes is impactful the bio-medical research field.

Language: Английский

Citations

3

Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques DOI
Manish Tiwari, Deepak Kumar Verma

International Journal of Speech Technology, Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 10, 2024

Language: Английский

Citations

3

Voice Disorder Multi-Class Classification for the Distinction of Parkinson’s Disease and Adductor Spasmodic Dysphonia DOI Creative Commons
Valerio Cesarini, Giovanni Saggio, Antonio Suppa

et al.

Applied Sciences, Journal Year: 2023, Volume and Issue: 13(15), P. 8562 - 8562

Published: July 25, 2023

Parkinson’s Disease and Adductor-type Spasmodic Dysphonia are two neurological disorders that greatly decrease the quality of life millions patients worldwide. Despite this great diffusion, related diagnoses often performed empirically, while it could be relevant to count on objective measurable biomarkers, among which researchers have been considering features voice impairment can useful indicators but sometimes lead confusion. Therefore, here, our purpose was aimed at developing a robust Machine Learning approach for multi-class classification based 6373 extracted from convenient dataset made sustained vowel/e/ an ad hoc selected Italian sentence, by 111 healthy subjects, 51 disease patients, 60 dysphonic patients. Correlation, Information Gain, Gain Ratio, Genetic Algorithm-based methodologies were compared feature selection, build subsets analyzed means Naïve Bayes, Random Forest, Multi-Layer Perceptron classifiers, trained with 10-fold cross-validation. As result, spectral, cepstral, prosodic, voicing-related assessed as most relevant, Algorithm effective selector, adopted classifiers similarly. In particular, + Bayes brought one highest accuracies in analysis, being 95.70% vowel 99.46% sentence.

Language: Английский

Citations

7

Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset DOI Creative Commons
Nourah M. Almarshady, Adal A. Alashban, Yousef Ajami Alotaibi

et al.

Applied Sciences, Journal Year: 2023, Volume and Issue: 13(17), P. 9567 - 9567

Published: Aug. 24, 2023

The rapid momentum of deep neural networks (DNNs) in recent years has yielded state-of-the-art performance various machine-learning tasks using speaker identification systems. Speaker is based on the speech signals and features that can be extracted from them. In this article, we proposed a system developed DNNs models. acoustic prosodic signal, such as pitch frequency (vocal cords vibration rate), energy (loudness speech), their derivations, any additional features. Additionally, article investigates existing recurrent (RNNs) models adapts them to design public YOHO LDC dataset. average accuracy was 91.93% best experiment for identification. Furthermore, paper helps uncover reasons analyzing speakers tokens yielding major errors increase system’s robustness regarding feature selection tune-up.

Language: Английский

Citations

7

Reverb and Noise as Real-World Effects in Speech Recognition Models: A Study and a Proposal of a Feature Set DOI Creative Commons
Valerio Cesarini, Giovanni Costantini

Applied Sciences, Journal Year: 2024, Volume and Issue: 14(23), P. 11446 - 11446

Published: Dec. 9, 2024

Reverberation and background noise are common unavoidable real-world phenomena that hinder automatic speaker recognition systems, particularly because these systems typically trained on noise-free data. Most models rely fixed audio feature sets. To evaluate the dependency of features reverberation noise, this study proposes augmenting commonly used mel-frequency cepstral coefficients (MFCCs) with relative spectral (RASTA) features. The performance was assessed using noisy data generated by applying pink to DEMoS dataset, which includes 56 speakers. Verification were clean MFCCs, RASTA features, or their combination as inputs. They validated augmented progressively increasing levels. results indicate MFCCs struggle identify main speaker, while method has difficulty opposite class. hybrid set, derived from combination, demonstrates best overall a compromise between two. Although MFCC is standard performs well training data, it shows significant tendency misclassify in scenarios, critical limitation for modern user-centric verification applications. therefore, proves effective balanced solution, optimizing both sensitivity specificity.

Language: Английский

Citations

1

Speaker Identification in Multiple Languages: Regional, Indonesian, and English with Short Utterance DOI Open Access
Ahmad Fikri, Amalia Zahra

International Journal of Emerging Technology and Advanced Engineering, Journal Year: 2023, Volume and Issue: 13(9), P. 25 - 35

Published: Oct. 3, 2023

One of the authentication models that are currently often used is based on biometrics, such as eye retina, fingerprint, and speech recognition. Moreover, textindependent speaker identification one domains recognition has been widely studied. Short duration in process challenges field Accuracy a great issue when shorter, besides system to be general enough various languages with different dialects which have their own characteristic tribe region. Therefore, author this study introduces multi comprise regional, Indonesian, English short utterance. Researchers MFCC technique extract voice features CNN classification model. There two kinds dataset used, open for regional language, Indonesian. Own recording 18 persons gender who each read text several paragraphs sentences Whereas public language consisted 80 speakers, 41 Sundanese 39 Javanese. As dataset, 126 male speakers 125 female were taken from LibriSpeech. Tests carried out separately variety duration, about 3 seconds languages, 1 The result, best accuracy obtained by 95% (regional dataset), 94% (English 98% (private dataset).

Language: Английский

Citations

1