Published: Dec. 1, 2024
Language: Английский
Published: Dec. 1, 2024
Language: Английский
Scientific Reports, Journal Year: 2024, Volume and Issue: 14(1)
Published: June 23, 2024
Detecting emotions from facial images is difficult because expressions can vary significantly. Previous research on using deep learning models to classify has been carried out various datasets that contain a limited range of expressions. This study expands the use for emotion recognition (FER) based Emognition dataset includes ten target emotions: amusement, awe, enthusiasm, liking, surprise, anger, disgust, fear, sadness, and neutral. A series data preprocessing was convert video into augment data. proposes Convolutional Neural Network (CNN) built through two approaches, which are transfer (fine-tuned) with pre-trained Inception-V3 MobileNet-V2 building scratch Taguchi method find robust combination hyperparameters setting. The proposed model demonstrated favorable performance over experimental processes an accuracy average F1-score 96% 0.95, respectively, test
Language: Английский
Citations
10Engineering Applications of Artificial Intelligence, Journal Year: 2025, Volume and Issue: 143, P. 110004 - 110004
Published: Jan. 8, 2025
Language: Английский
Citations
1Engineering Applications of Artificial Intelligence, Journal Year: 2024, Volume and Issue: 133, P. 108413 - 108413
Published: April 12, 2024
Language: Английский
Citations
4IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 108052 - 108071
Published: Jan. 1, 2024
Multimodal emotion recognition is a developing field that analyzes emotions through various channels, mainly audio, video, and text. However, existing state-of-the-art systems focus on two to three modalities at the most, utilize traditional techniques, fail consider emotional interplay, lack scope add more modalities, aren't efficient in predicting accurately. This research proposes novel approach using rule-based convert non-verbal cues text, inspired by limited prior attempt lacked proper benchmarking. It achieves multimodal utilizing distilRoBERTa, large language model fine-tuned with combined textual representation of audio (such as loudness, spectral flux, MFCCs, pitch stability, emphasis) visual features (action units) extracted from videos. evaluated datasets RAVDESS BAUM-1. high accuracy (93.18% 93.69% BAUM-1) both datasets, performing par SOTA (state-of-the-art) systems, if not slightly better. Furthermore, highlights potential for incorporating additional transforming them into text refine further pre-trained models, giving rise comprehensive recognition.
Language: Английский
Citations
3Neural Computing and Applications, Journal Year: 2025, Volume and Issue: unknown
Published: Feb. 3, 2025
Language: Английский
Citations
0Symmetry, Journal Year: 2025, Volume and Issue: 17(3), P. 397 - 397
Published: March 6, 2025
This study introduces a custom-designed CNN architecture that extracts robust, multi-level facial features and incorporates preprocessing techniques to correct or reduce asymmetry before classification. The innovative characteristics of this research lie in its integrated approach overcoming challenges enhancing CNN-based emotion recognition. is completed by well-known data augmentation strategies—using methods such as vertical flipping shuffling—that generate symmetric variations images, effectively balancing the dataset improving recognition accuracy. Additionally, Loss Weight parameter used fine-tune training, thereby optimizing performance across diverse unbalanced classes. Collectively, all these contribute an efficient, real-time system outperforms traditional models offers practical benefits for various applications while also addressing inherent detection. Our experimental results demonstrate superior compared other methods, marking step forward ranging from human–computer interaction immersive technologies acknowledging privacy ethical considerations.
Language: Английский
Citations
0Multimedia Systems, Journal Year: 2025, Volume and Issue: 31(2)
Published: March 23, 2025
Language: Английский
Citations
0Multimodal Technologies and Interaction, Journal Year: 2025, Volume and Issue: 9(4), P. 31 - 31
Published: March 31, 2025
Artificial agents are expected to increasingly interact with humans and demonstrate multimodal adaptive emotional responses. Such social integration requires both perception production mechanisms, thus enabling a more realistic approach alignment than existing systems. Indeed, emotion recognition methods rely on behavioral signals, predominantly facial expressions, as well non-invasive brain recordings, such Electroencephalograms (EEGs) functional Magnetic Resonance Imaging (fMRI), identify humans’ emotions, but accurate labeling remains challenge. This paper introduces novel examining how physiological signals can be used predict activity in emotion-related regions of the brain. To this end, we propose deep learning network that processes two categories recorded alongside during conversations: (video audio) one signal (blood pulse). Our enables (1) prediction from these inputs, (2) assessment our model’s performance depending nature interlocutor (human or robot) region interest. Results proposed architecture outperforms models anterior insula hypothalamus regions, for interactions human robot. An ablation study evaluating subsets input modalities indicates local was reduced when omitted. However, they also revealed data pulse) achieve similar levels predictions alone compared full model, further underscoring importance somatic markers central nervous system’s processing emotions.
Language: Английский
Citations
0Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown
Published: April 15, 2025
Language: Английский
Citations
0Circuits Systems and Signal Processing, Journal Year: 2025, Volume and Issue: unknown
Published: April 25, 2025
Language: Английский
Citations
0