Cited by Video Emotion Recognition Using 3D-Convolutional Neural Network

Using transformers for multimodal emotion recognition: Taxonomies and state of the art review DOI

Samira Hazmoune, Fateh Bougamouza

Engineering Applications of Artificial Intelligence, Journal Year: 2024, Volume and Issue: 133, P. 108339 - 108339

Published: April 2, 2024

Language: Английский

Citations

A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset DOI

Cristina Luna-Jiménez, Ricardo Kleinlein, David Griol

et al.

Applied Sciences, Journal Year: 2021, Volume and Issue: 12(1), P. 327 - 327

Published: Dec. 30, 2021

Emotion recognition is attracting the attention of research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted a speech (SER) and facial (FER). For SER, evaluated pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction fine-tuning. The best accuracy results were achieved when fine-tuned whole model by appending multilayer perceptron on top it, confirming training was more robust it did not start from scratch previous knowledge network similar task adapt. Regarding recognizer, extracted Action Units videos compared performance between employing static models against sequential models. Results showed beat narrow difference. Error analysis reported visual systems could improve with detector high-emotional load frames, which opened new line discover ways learn videos. Finally, combining these modalities late fusion strategy, 86.70% RAVDESS dataset subject-wise 5-CV evaluation, classifying eight emotions. demonstrated carried relevant information detect users’ emotional state their combination allowed final performance.

Language: Английский

Citations

A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions DOI

Arnab Barua, Mobyen Uddin Ahmed, Shahina Begum

et al.

IEEE Access, Journal Year: 2023, Volume and Issue: 11, P. 14804 - 14831

Published: Jan. 1, 2023

Multimodal machine learning (MML) is a tempting multidisciplinary research area where heterogeneous data from multiple modalities and (ML) are combined to solve critical problems. Usually, works use single modality, such as images, audio, text, signals. However, real-world issues have become now, handling them using of instead modality can significantly impact finding solutions. ML algorithms play an essential role by tuning parameters in developing MML models. This paper reviews recent advancements the challenges MML, namely: representation, translation, alignment, fusion co-learning, presents gaps challenges. A systematic literature review (SLR) applied define progress trends on those domain. In total, 1032 articles were examined this extract features like source, domain, application, etc. article will help researchers understand constant state navigate selection future directions.

Language: Английский

Citations

Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features DOI

Dilnoza Mamieva,

Akmalbek Abdusalomov,

Alpamis Kutlimuratov

et al.

Sensors, Journal Year: 2023, Volume and Issue: 23(12), P. 5475 - 5475

Published: June 9, 2023

Methods for detecting emotions that employ many modalities at the same time have been found to be more accurate and resilient than those rely on a single sense. This is due fact sentiments may conveyed in wide range of modalities, each which offers different complementary window into thoughts speaker. In this way, complete picture person’s emotional state emerge through fusion analysis data from several modalities. The research suggests new attention-based approach multimodal emotion recognition. technique integrates facial speech features extracted by independent encoders order pick aspects are most informative. It increases system’s accuracy processing various sizes focuses useful bits input. A comprehensive representation expressions use both low- high-level features. These combined using network create feature vector then fed classification layer developed system evaluated two datasets, IEMOCAP CMU-MOSEI, shows superior performance compared existing models, achieving weighted WA 74.6% an F1 score 66.1% dataset 80.7% 73.7% CMU-MOSEI dataset.

Language: Английский

Citations

Two Birds With One Stone: Knowledge-Embedded Temporal Convolutional Transformer for Depression Detection and Emotion Recognition DOI

Wenbo Zheng, Lan Yan, Fei‐Yue Wang

et al.

IEEE Transactions on Affective Computing, Journal Year: 2023, Volume and Issue: 14(4), P. 2595 - 2613

Published: June 5, 2023

Depression is a critical problem in modern society that affects an estimated 350 million people worldwide, causing feelings of sadness and lack interest pleasure. Emotional disorders are gaining closely entwined with depression, because one contributes to understanding the other. Despite achievements two separate tasks emotion recognition depression detection, there has not been much prior effort build unified model can connect these different modalities, including multimedia (text, audio, video) unobtrusive physiological signals (e.g., electroencephalography). We propose novel temporal convolutional transformer knowledge embedding address joint task detection recognition. This approach only learns multimodal embeddings across domains via temporal but also exploits special-domain from medical graphs improve performance It essential features learned by our method be perceived as priori suitable for increasing other related tasks. Our illustrates case "two birds stone" sense or more efficiently handled unique model, which captures effective features. Experimental results on ten real-world datasets show proposed significantly outperforms state-of-the-art approaches. On hand, experiments methodology applied reasoning effectively supports improves its performance.

Language: Английский

Citations

IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients DOI

Damilola D. Olatinwo, Adnan M. Abu‐Mahfouz, Gerhard P. Hancke

et al.

Sensors, Journal Year: 2023, Volume and Issue: 23(6), P. 2948 - 2948

Published: March 8, 2023

Internet of things (IoT)-enabled wireless body area network (WBAN) is an emerging technology that combines medical devices, and non-medical devices for healthcare management applications. Speech emotion recognition (SER) active research field in the domain machine learning. It a technique can be used to automatically identify speakers' emotions from their speech. However, SER system, especially domain, confronted with few challenges. For example, low prediction accuracy, high computational complexity, delay real-time prediction, how appropriate features Motivated by these gaps, we proposed emotion-aware IoT-enabled WBAN system within framework where data processing long-range transmissions are performed edge AI patients' speech as well capture changes before after treatment. Additionally, investigated effectiveness different learning deep algorithms terms performance classification, feature extraction methods, normalization methods. We developed hybrid model, i.e., convolutional neural (CNN) bidirectional long short-term memory (BiLSTM), regularized CNN model. combined models optimization strategies regularization techniques improve reduce generalization error, complexity networks time, power, space. Different experiments were check efficiency algorithms. The compared related existing model evaluation validation using standard metrics such precision, recall, F1 score, confusion matrix, differences between actual predicted values. experimental results proved one outperformed accuracy about 98%.

Language: Английский

Citations

Exploring contactless techniques in multimodal emotion recognition: insights into diverse applications, challenges, solutions, and prospects DOI

Umair Ali Khan,

Qianru Xu,

Yang Liu

et al.

Multimedia Systems, Journal Year: 2024, Volume and Issue: 30(3)

Published: April 6, 2024

Abstract In recent years, emotion recognition has received significant attention, presenting a plethora of opportunities for application in diverse fields such as human–computer interaction, psychology, and neuroscience, to name few. Although unimodal methods offer certain benefits, they have limited ability encompass the full spectrum human emotional expression. contrast, Multimodal Emotion Recognition (MER) delivers more holistic detailed insight into an individual's state. However, existing multimodal data collection approaches utilizing contact-based devices hinder effective deployment this technology. We address issue by examining potential contactless techniques MER. our tertiary review study, we highlight unaddressed gaps body literature on Through rigorous analysis MER studies, identify modalities, specific cues, open datasets with unique modality combinations. This further leads us formulation comparative schema mapping requirements given scenario combination. Subsequently, discuss implementation Contactless (CMER) systems use cases help which serves evaluation blueprint. Furthermore, paper also explores ethical privacy considerations concerning employment proposes key principles addressing concerns. The investigates current challenges future prospects field, offering recommendations research development CMER. Our study resource researchers practitioners field recognition, well those intrigued broader outcomes rapidly progressing

Language: Английский

Citations

Multimodal Emotion Recognition via Convolutional Neural Networks: Comparison of different strategies on two multimodal datasets DOI

Umberto Bilotti, Carmen Bisogni, Maria De Marsico

et al.

Engineering Applications of Artificial Intelligence, Journal Year: 2023, Volume and Issue: 130, P. 107708 - 107708

Published: Dec. 14, 2023

The aim of this paper is to investigate emotion recognition using a multimodal approach that exploits convolutional neural networks (CNNs) with multiple input. Multimodal approaches allow different modalities cooperate in order achieve generally better performances because features are extracted from pieces information. In work, the facial frames, optical flow computed consecutive and Mel Spectrograms (from word melody) videos combined together ways understand which modality combination works better. Several experiments run on models by first considering one at time so good accuracy results found each modality. Afterward, concatenated create final model allows inputs. For datasets used BAUM-1 ((Bahçeşehir University Affective Database - 1) RAVDESS (Ryerson Audio–Visual Emotional Speech Song), both collect two distinguished sets based intensity expression, acted/strong or spontaneous/normal, providing representations following emotional states will be taken into consideration: angry, disgust, fearful, happy sad. proposed shown through some confusion matrices, demonstrating than compared proposals literature. best achieved dataset about 95%, while it 95.5%.

Language: Английский

Citations

Design of smart home system speech emotion recognition model based on ensemble deep learning and feature fusion DOI

Mengsheng Wang, Hongbin Ma,

Yingli Wang

et al.

Applied Acoustics, Journal Year: 2024, Volume and Issue: 218, P. 109886 - 109886

Published: Jan. 31, 2024

Language: Английский

Citations

AttA-NET: Attention Aggregation Network for Audio-Visual Emotion Recognition DOI

Ruijia Fan,

Hong Liu, Yidi Li

et al.

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Journal Year: 2024, Volume and Issue: unknown, P. 8030 - 8034

Published: March 18, 2024

In video-based emotion recognition, effective multi-modal fusion techniques are essential to leverage the complementary relationship between audio and visual modalities. Recent attention-based methods widely leveraged for capturing modal-shared properties. However, they often ignore modal-specific properties of modalities unalignment model-shared emotional semantic features. this paper, an Attention Aggregation Network (AttA-NET) is proposed address these challenges. An attention aggregation module get effectively. This comprises similarity-aware enhancement blocks a contrastive loss that facilitates aligning Moreover, auxiliary uni-modal classifier introduced obtain properties, in which intra-modal discriminative features fully extracted. Under joint optimization classification loss, information can be infused. Extensive experiments on RAVDESS PKU-ER datasets validate superiority AttA-NET. The code available at: https://github.com/NariFan2002/AttA-NET.

Language: Английский

Citations