Cited by Editorial: Special Issue on Unobtrusive Physiological Measurement Methods for Affective Applications

DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection DOI

Yongfeng Tao, Minqiang Yang, Huiru Li

et al.

IEEE Transactions on Knowledge and Data Engineering, Journal Year: 2024, Volume and Issue: 36(7), P. 2956 - 2966

Published: Jan. 5, 2024

Depression is one of the most common mental illnesses, but few currently proposed in-depth models based on social media data take into account both temporal and spatial information in for detection depression. In this paper, we present an efficient, low-covariance multimodal integrated spatio-temporal converter framework called DepMSTAT, which aims to detect depression using acoustic visual features data. The consists four modules: a preprocessing module, token generation Spatial-Temporal Attentional Transformer (STAT) classifier module. To efficiently capture correlations data, plug-and-play STAT module proposed. capable extracting unimodal fusing information, playing key role analysis Through extensive experiments database (D-Vlog), method paper shows high accuracy (71.53%) detection, achieving performance that exceeds models. This work provides scaffold studies assists

Language: Английский

Citations

A Systematic Review on Multimodal Emotion Recognition: Building Blocks, Current State, Applications, and Challenges DOI

Sepideh Kalateh, Luis A. Estrada-Jimenez, Sanaz Nikghadam-Hojjati

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 103976 - 104019

Published: Jan. 1, 2024

Emotion recognition involves accurately interpreting human emotions from various sources and modalities, including questionnaires, verbal, physiological signals. With its broad applications in affective computing, computational creativity, human-robot interactions, market research, the field has seen a surge interest recent years. This paper presents systematic review of multimodal emotion (MER) techniques developed 2014 to 2024, encompassing signals, facial, body gesture, speech as well emerging methods like sketches recognition. The explores models, distinguishing between emotions, feelings, sentiments, moods, along with emotional expression, categorized both artistic non-verbal ways. It also discusses background automated systems introduces seven criteria for evaluating modalities alongside current state analysis MER, drawn human-centric perspective this field. By selecting PRISMA guidelines carefully analyzing 45 selected articles, provides comprehensive perspectives into existing studies, datasets, technical approaches, identified gaps, future directions MER. highlights challenges

Language: Английский

Citations

A multimodal shared network with a cross-modal distribution constraint for continuous emotion recognition DOI

Chiqin Li, Lun Xie, Xingmao Shao

et al.

Engineering Applications of Artificial Intelligence, Journal Year: 2024, Volume and Issue: 133, P. 108413 - 108413

Published: April 12, 2024

Language: Английский

Citations

Reading Between the Frames: Multi-modal Depression Detection in Videos from Non-verbal Cues DOI

David Gimeno-Gómez, Ana-Maria Bucur, Adrian Cosma

et al.

Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 191 - 209

Published: Jan. 1, 2024

Language: Английский

Citations

Depressive and mania mood state detection through voice as a biomarker using machine learning DOI

Jun Ji,

Wentian Dong,

Jiaqi Li

et al.

Frontiers in Neurology, Journal Year: 2024, Volume and Issue: 15

Published: July 4, 2024

Introduction Depressive and manic states contribute significantly to the global social burden, but objective detection tools are still lacking. This study investigates feasibility of utilizing voice as a biomarker detect these mood states. Methods:From real-world emotional journal recordings, 22 features were retrieved in this study, 21 which showed significant differences among Additionally, we applied leave-one-subject-out strategy train validate four classification models: Chinese-speech-pretrain-GRU, Gate Recurrent Unit (GRU), Bi-directional Long Short-Term Memory (BiLSTM), Linear Discriminant Analysis (LDA). Results Our results indicated that Chinese-speech-pretrain-GRU model performed best, achieving sensitivities 77.5% 54.8% specificities 86.1% 90.3% for detecting depressive states, respectively, with an overall accuracy 80.2%. Discussion These findings show machine learning can reliably differentiate between via analysis, allowing more precise approach disorder assessment.

Language: Английский

Citations

LSCAformer: Long and short-term cross-attention-aware transformer for depression recognition from video sequences DOI

Lang He, Zheng Li, Prayag Tiwari

et al.

Biomedical Signal Processing and Control, Journal Year: 2024, Volume and Issue: 98, P. 106767 - 106767

Published: Aug. 19, 2024

Language: Английский

Citations

DepITCM: an audio-visual method for detecting depression DOI

Lishan Zhang, Zhenhua Liu,

Yumei Wan

et al.

Frontiers in Psychiatry, Journal Year: 2025, Volume and Issue: 15

Published: Jan. 23, 2025

Introduction Depression is a prevalent mental disorder, and early screening treatment are crucial for detecting depression. However, there still some limitations in the currently proposed deep models based on audio-video data, example, it difficult to effectively extract select useful multimodal information features from very few studies have been able focus three dimensions of information: time, channel, space at same time depression detection. In addition, challenges utilizing other tasks enhance prediction accuracy. The resolution these issues constructing Methods this paper, we propose multi-task representation learning vision audio detection model (DepITCM).The comprises main modules: data preprocessing module, Inception-Temporal-Channel Principal Component Analysis Module(ITCM Encoder), module. To efficiently rich feature representations video ITCM Encoder employs staged extraction strategy, transitioning global local features. This approach enables capture while emphasizing fusion temporal, spatial finer detail. Furthermore, inspired by strategies, paper enhances primary task classification incorporating secondary (regression task) improve overall performance. Results We conducted experiments AVEC2017 AVEC2019 datasets. results show that, task, our method achieved an F1 score 0.823 accuracy dataset, 0.816 0.810 dataset. regression RMSE was 6.10 (AVEC2017) 4.89 (AVEC2019), respectively. These demonstrate that outperforms most existing methods both tasks. can performance when using learning. Discussion Although through multimodality has shown good previous studies. utilize complementary between different Therefore, work combines Previous mostly focused ignoring importance Based problems studies, made corresponding improvements provide more comprehensive effective solution

Language: Английский

Citations

Deep learning-based depression recognition through facial expression: A systematic review DOI

Xiaoming Cao, Lingling Zhai,

Pengpeng Zhai

et al.

Neurocomputing, Journal Year: 2025, Volume and Issue: unknown, P. 129605 - 129605

Published: Feb. 1, 2025

Language: Английский

Citations

A spatial and temporal transformer-based EEG emotion recognition in VR environment DOI

Ming Li,

Peng Yu,

Yang Shen

et al.

Frontiers in Human Neuroscience, Journal Year: 2025, Volume and Issue: 19

Published: Feb. 26, 2025

With the rapid development of deep learning, Electroencephalograph(EEG) emotion recognition has played a significant role in affective brain-computer interfaces. Many advanced models have achieved excellent results. However, current research is mostly conducted laboratory settings for induction, which lacks sufficient ecological validity and differs significantly from real-world scenarios. Moreover, are typically trained tested on datasets collected environments, with little validation their effectiveness situations. VR, providing highly immersive realistic experience, an ideal tool emotional research. In this paper, we collect EEG data participants while they watched VR videos. We propose purely Transformer-based method, EmoSTT. use two separate Transformer modules to comprehensively model temporal spatial information signals. validate EmoSTT passive paradigm environment active dataset environment. Compared state-of-the-art methods, our method achieves robust classification performance can be well transferred between different elicitation paradigms.

Language: Английский

Citations

How Do You Perceive My Face? Recognizing Facial Expressions in Multi-modal Context by Modeling Mental Representations DOI

Florian Blume,

Rendong Qu,

Pia Bideau

et al.

Lecture notes in computer science, Journal Year: 2025, Volume and Issue: unknown, P. 20 - 36

Published: Jan. 1, 2025

Language: Английский

Citations