Speech emotion recognition using multi resolution Hilbert transform based spectral and entropy features DOI

Siba Prasad Mishra,

Pankaj Warule, Suman Deb

et al.

Applied Acoustics, Journal Year: 2024, Volume and Issue: 229, P. 110403 - 110403

Published: Nov. 15, 2024

Language: Английский

Multi-task coordinate attention gating network for speech emotion recognition under noisy circumstances DOI
Linhui Sun, Yunlong Lei, Zixiao Zhang

et al.

Biomedical Signal Processing and Control, Journal Year: 2025, Volume and Issue: 107, P. 107811 - 107811

Published: March 11, 2025

Language: Английский

Citations

0

Dual-Tree Complex Wavelet Transform for the Automatic Detection of the Common Cold Based on Speech Signals DOI
Pankaj Warule,

Snigdha Chandratre,

Smita Daware

et al.

Circuits Systems and Signal Processing, Journal Year: 2025, Volume and Issue: unknown

Published: March 10, 2025

Language: Английский

Citations

0

Cross-lingual Speech Emotion Recognition for Mental Health Counselling and Aid DOI Open Access
Kanwaljeet Kaur,

N Aishwarya,

Ganesh Kumar Chellamani

et al.

Procedia Computer Science, Journal Year: 2025, Volume and Issue: 258, P. 1425 - 1434

Published: Jan. 1, 2025

Language: Английский

Citations

0

Harmonizing Emotions: A Novel Approach to Audio Emotion Classification using Log-Melspectrogram with Augmentation DOI

Muskan Agarwal,

Kanwarpartap Singh Gill, Nitin Thapliyal

et al.

2022 International Conference on Communication, Computing and Internet of Things (IC3IoT), Journal Year: 2024, Volume and Issue: unknown, P. 1 - 4

Published: April 17, 2024

This study article explores the field of audio emotion categorization, using a unique method that involves log-Mel spectrogram with augmentation. The research shows respectable accuracy rate 63%, which is significantly lower than MFCC. However, it also highlights significant potential for development. utilises sophisticated model, namely 2D CNN resembles VGG19 architecture. In this characteristics are treated as 30x216 pixel picture. By datasets such TESS, CREMA-D, SAVEE, and RAVDESS, specialised functions used to extract wide range characteristics. importance promptly identifying emotions via auditory cues underscored. model demonstrates encouraging outcomes, capacity exceed conventional approaches. A class contain result in order facilitate faster interpretation. not only enhances expanding area classification but creates opportunities effective precise identification, bridging divide between image-based methods analysis.

Language: Английский

Citations

1

Accuracy Enhancement Method for Speech Emotion Recognition From Spectrogram Using Temporal Frequency Correlation and Positional Information Learning Through Knowledge Transfer DOI Creative Commons
Jeong-Yoon Kim,

Seung-Ho Lee

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 128039 - 128048

Published: Jan. 1, 2024

In this paper, we propose a method to improve the accuracy of speech emotion recognition (SER) by using vision transformer (ViT) attend correlation frequency (y-axis) with time (x-axis) in spectrogram and transferring positional information between ViT through knowledge transfer.The proposed has following originality i) We use vertically segmented patches log-Mel analyze frequencies over time.This type patch allows us correlate most relevant for particular they were uttered.ii) image coordinate encoding, an absolute encoding suitable ViT.By normalizing x, y coordinates -1 1 concatenating them image, can effectively provide valid ViT.iii) Through feature map matching, locality location teacher network is transmitted student network.Teacher that contains convolutional stem position structure lacks basic structure.In matching stage, train mean error (L1 loss) minimize difference maps two networks.To validate method, three datasets (SAVEE, EmoDB, CREMA-D) consisting converted into spectrograms comparison experiments.The experimental results show significantly outperforms state-of-the-art methods terms weighted while requiring fewer floating point operations (FLOPs).Moreover, performance better than network, indicating introduction L1 loss solves overfitting problem.Overall, offers promising solution SER providing improved efficiency performance.

Language: Английский

Citations

1

A Lightweight Multi-Scale Model for Speech Emotion Recognition DOI Creative Commons
Haoming Li, Daqi Zhao,

Jingwen Wang

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 130228 - 130240

Published: Jan. 1, 2024

Recognizing emotional states from speech is essential for human-computer interaction. It a challenging task to realize effective emotion recognition (SER) on platforms with limited memory capacity and computing power. In this paper, we propose lightweight multi-scale deep neural network architecture SER, which takes Mel Frequency Cepstral Coefficients (MFCCs) as input. order feature extraction, new Inception module, named A_Inception. A_Inception combines the merits of module attention-based rectified linear units (AReLU) thus can learn features adaptively low computational cost. Meanwhile, extract most important information, multiscale cepstral attention temporal-cepstral (MCA-TCA) module. The idea MCA-TCA focus key components positions. Furthermore, loss function combining Softmax Center adopted supervise model training so enhance model's discriminative Experiments have been carried out IEMOCAP, EMODB SAVEE datasets verify performance proposed compare state-of-the-art SER models. Numerical results reveal that has small number parameters (0.82M) much lower cost (81.64 MFLOPs) than compared models, achieves impressive accuracy all considered.

Language: Английский

Citations

0

A Symphony of Sentiments using Log-Melspectrogram Techniques for Emotional Classification DOI

Eshika Jain,

Kanwarpartap Singh Gill, Mukesh Kumar

et al.

Published: Aug. 8, 2024

Language: Английский

Citations

0

Fixed frequency range empirical wavelet transform based acoustic and entropy features for speech emotion recognition DOI

Siba Prasad Mishra,

Pankaj Warule, Suman Deb

et al.

Speech Communication, Journal Year: 2024, Volume and Issue: 166, P. 103148 - 103148

Published: Nov. 14, 2024

Language: Английский

Citations

0

Speech emotion recognition using multi resolution Hilbert transform based spectral and entropy features DOI

Siba Prasad Mishra,

Pankaj Warule, Suman Deb

et al.

Applied Acoustics, Journal Year: 2024, Volume and Issue: 229, P. 110403 - 110403

Published: Nov. 15, 2024

Language: Английский

Citations

0