Опубликована: Дек. 1, 2024
Язык: Английский
Опубликована: Дек. 1, 2024
Язык: Английский
Scientific Reports, Год журнала: 2024, Номер 14(1)
Опубликована: Июнь 23, 2024
Detecting emotions from facial images is difficult because expressions can vary significantly. Previous research on using deep learning models to classify has been carried out various datasets that contain a limited range of expressions. This study expands the use for emotion recognition (FER) based Emognition dataset includes ten target emotions: amusement, awe, enthusiasm, liking, surprise, anger, disgust, fear, sadness, and neutral. A series data preprocessing was convert video into augment data. proposes Convolutional Neural Network (CNN) built through two approaches, which are transfer (fine-tuned) with pre-trained Inception-V3 MobileNet-V2 building scratch Taguchi method find robust combination hyperparameters setting. The proposed model demonstrated favorable performance over experimental processes an accuracy average F1-score 96% 0.95, respectively, test
Язык: Английский
Процитировано
10Engineering Applications of Artificial Intelligence, Год журнала: 2025, Номер 143, С. 110004 - 110004
Опубликована: Янв. 8, 2025
Язык: Английский
Процитировано
1Engineering Applications of Artificial Intelligence, Год журнала: 2024, Номер 133, С. 108413 - 108413
Опубликована: Апрель 12, 2024
Язык: Английский
Процитировано
4IEEE Access, Год журнала: 2024, Номер 12, С. 108052 - 108071
Опубликована: Янв. 1, 2024
Multimodal emotion recognition is a developing field that analyzes emotions through various channels, mainly audio, video, and text. However, existing state-of-the-art systems focus on two to three modalities at the most, utilize traditional techniques, fail consider emotional interplay, lack scope add more modalities, aren't efficient in predicting accurately. This research proposes novel approach using rule-based convert non-verbal cues text, inspired by limited prior attempt lacked proper benchmarking. It achieves multimodal utilizing distilRoBERTa, large language model fine-tuned with combined textual representation of audio (such as loudness, spectral flux, MFCCs, pitch stability, emphasis) visual features (action units) extracted from videos. evaluated datasets RAVDESS BAUM-1. high accuracy (93.18% 93.69% BAUM-1) both datasets, performing par SOTA (state-of-the-art) systems, if not slightly better. Furthermore, highlights potential for incorporating additional transforming them into text refine further pre-trained models, giving rise comprehensive recognition.
Язык: Английский
Процитировано
3Neural Computing and Applications, Год журнала: 2025, Номер unknown
Опубликована: Фев. 3, 2025
Язык: Английский
Процитировано
0Symmetry, Год журнала: 2025, Номер 17(3), С. 397 - 397
Опубликована: Март 6, 2025
This study introduces a custom-designed CNN architecture that extracts robust, multi-level facial features and incorporates preprocessing techniques to correct or reduce asymmetry before classification. The innovative characteristics of this research lie in its integrated approach overcoming challenges enhancing CNN-based emotion recognition. is completed by well-known data augmentation strategies—using methods such as vertical flipping shuffling—that generate symmetric variations images, effectively balancing the dataset improving recognition accuracy. Additionally, Loss Weight parameter used fine-tune training, thereby optimizing performance across diverse unbalanced classes. Collectively, all these contribute an efficient, real-time system outperforms traditional models offers practical benefits for various applications while also addressing inherent detection. Our experimental results demonstrate superior compared other methods, marking step forward ranging from human–computer interaction immersive technologies acknowledging privacy ethical considerations.
Язык: Английский
Процитировано
0Multimedia Systems, Год журнала: 2025, Номер 31(2)
Опубликована: Март 23, 2025
Язык: Английский
Процитировано
0Multimodal Technologies and Interaction, Год журнала: 2025, Номер 9(4), С. 31 - 31
Опубликована: Март 31, 2025
Artificial agents are expected to increasingly interact with humans and demonstrate multimodal adaptive emotional responses. Such social integration requires both perception production mechanisms, thus enabling a more realistic approach alignment than existing systems. Indeed, emotion recognition methods rely on behavioral signals, predominantly facial expressions, as well non-invasive brain recordings, such Electroencephalograms (EEGs) functional Magnetic Resonance Imaging (fMRI), identify humans’ emotions, but accurate labeling remains challenge. This paper introduces novel examining how physiological signals can be used predict activity in emotion-related regions of the brain. To this end, we propose deep learning network that processes two categories recorded alongside during conversations: (video audio) one signal (blood pulse). Our enables (1) prediction from these inputs, (2) assessment our model’s performance depending nature interlocutor (human or robot) region interest. Results proposed architecture outperforms models anterior insula hypothalamus regions, for interactions human robot. An ablation study evaluating subsets input modalities indicates local was reduced when omitted. However, they also revealed data pulse) achieve similar levels predictions alone compared full model, further underscoring importance somatic markers central nervous system’s processing emotions.
Язык: Английский
Процитировано
0Research Square (Research Square), Год журнала: 2025, Номер unknown
Опубликована: Апрель 15, 2025
Язык: Английский
Процитировано
0Circuits Systems and Signal Processing, Год журнала: 2025, Номер unknown
Опубликована: Апрель 25, 2025
Язык: Английский
Процитировано
0