Published: Dec. 3, 2024
Language: Английский
Published: Dec. 3, 2024
Language: Английский
Circuits Systems and Signal Processing, Journal Year: 2023, Volume and Issue: 43(4), P. 2341 - 2384
Published: Dec. 16, 2023
Language: Английский
Citations
10Medical Engineering & Physics, Journal Year: 2025, Volume and Issue: 137, P. 104302 - 104302
Published: Feb. 6, 2025
Language: Английский
Citations
0Circuits Systems and Signal Processing, Journal Year: 2025, Volume and Issue: unknown
Published: April 25, 2025
Language: Английский
Citations
0Procedia Computer Science, Journal Year: 2025, Volume and Issue: 258, P. 3693 - 3702
Published: Jan. 1, 2025
Language: Английский
Citations
0Multimedia Tools and Applications, Journal Year: 2024, Volume and Issue: unknown
Published: June 17, 2024
Language: Английский
Citations
2Published: May 2, 2024
Human Computer Interaction (HCI) relies on accurate speech emotion identification. Speech Emotion Recognition (SER) analyzes voice signals to classify emotions. English based has been extensively studied, while Bangla SER not. The study integrates a one-dimensional convolution neural network with long short-term memory (LSTM) architecture into fully linked for SER. categorization requires feature inclusion, which this method achieves. We included Additive White Gaussian Noise (AWGN), signal elongation, and pitch alteration improve dataset dependability. Mel-frequency cepstral coefficients (MFCC), Mel-Spectrogram, Zero Crossing Rate (ZCR), chromagram, Root Mean Square Error are analyzed in study. One-dimensional convolutional blocks extract local information, LSTM layers catch global trends our model. Training testing loss curves, confusion matrix, recall, precision, F1-score, accuracy used evaluate the assessed using two cutting-edge datasets, SUST Emotional Corpus (SUBESCO) Ryerson Audio-Visual Database of Song (RAVDESS). Experimental results show that suggested BSER model is more resilient than baseline models both datasets. improves research sector shows hybrid can detect emotions inputs.
Language: Английский
Citations
1Electronics, Journal Year: 2024, Volume and Issue: 13(11), P. 2064 - 2064
Published: May 25, 2024
Acoustic event detection (AED) systems, combined with video surveillance can enhance urban security and safety by automatically detecting incidents, supporting the smart city concept. AED systems mostly use mel spectrograms as a well-known effective acoustic feature. The spectrogram is combination of frequency bands. A big challenge that some bands may be similar in different events useless AED. Removing reduces input feature dimension highly desirable. This article proposes mathematical analysis method to identify eliminate ineffective improve systems’ efficiency. proposed approach uses Student’s t-test compare from events. similarity between each band among calculated using two-sample t-test, allowing identification distinct these accelerates training speed used classifier reducing number features, also enhances system’s accuracy Based on obtained results, 26.3%. results showed an average difference 7.77% Jaccard, 4.07% Dice, 5.7% Hamming distance selected train test datasets. These small values underscore validity for dataset.
Language: Английский
Citations
1Revue d intelligence artificielle, Journal Year: 2024, Volume and Issue: 38(3), P. 913 - 927
Published: June 21, 2024
Speech Emotion Recognition (SER) is very crucial in enriching next generation human machine interaction (HMI) with emotional intelligence capabilities by extracting the emotions from words and voice.However, current SER techniques are developed within experimental boundaries faces major challenges such as lack of robustness across languages, cultures, age gaps gender speakers.Very little work carried out for Indian corpus which has higher diversity, large number dialects, vast changes due to regional geographical aspects.India one largest customers HMI systems, social networking sites internet users, therefore it that focuses on corpuses.This paper presents, cross (CCSER) using multiple acoustic features (MAF) deep convolution neural network (DCNN) improve SER.The MAF consists various spectral, temporal voice quality features.Further, Fire Hawk based optimization (FHO) technique utilized salient feature selection.The FHO selects important minimize computational complexity distinctiveness inter class variance features.The DCNN algorithm provides better correlation, representation, description variation timbre, intonation pitch, superior connectivity global local speech signal characterize corpus.The outcomes suggested evaluated Indo-Aryan language family (Hindi Urdu) Dravidian Language (Telugu Kannada).The proposed scheme results improved accuracy multilingual performs traditional techniques.It an 58.83%, 61.75%, 69.75% 45.51% Hindi, Urdu, Telugu Kannada training.
Language: Английский
Citations
1International Journal of Speech Technology, Journal Year: 2024, Volume and Issue: 27(3), P. 551 - 568
Published: July 10, 2024
Language: Английский
Citations
1Published: May 24, 2024
Language: Английский
Citations
1