Lecture notes in networks and systems, Journal Year: 2025, Volume and Issue: unknown, P. 203 - 214
Published: Jan. 1, 2025
Language: Английский
Lecture notes in networks and systems, Journal Year: 2025, Volume and Issue: unknown, P. 203 - 214
Published: Jan. 1, 2025
Language: Английский
Engineering Applications of Artificial Intelligence, Journal Year: 2024, Volume and Issue: 133, P. 108339 - 108339
Published: April 2, 2024
Language: Английский
Citations
24Applied Sciences, Journal Year: 2021, Volume and Issue: 12(1), P. 327 - 327
Published: Dec. 30, 2021
Emotion recognition is attracting the attention of research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted a speech (SER) and facial (FER). For SER, evaluated pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction fine-tuning. The best accuracy results were achieved when fine-tuned whole model by appending multilayer perceptron on top it, confirming training was more robust it did not start from scratch previous knowledge network similar task adapt. Regarding recognizer, extracted Action Units videos compared performance between employing static models against sequential models. Results showed beat narrow difference. Error analysis reported visual systems could improve with detector high-emotional load frames, which opened new line discover ways learn videos. Finally, combining these modalities late fusion strategy, 86.70% RAVDESS dataset subject-wise 5-CV evaluation, classifying eight emotions. demonstrated carried relevant information detect users’ emotional state their combination allowed final performance.
Language: Английский
Citations
70IEEE Access, Journal Year: 2023, Volume and Issue: 11, P. 14804 - 14831
Published: Jan. 1, 2023
Multimodal machine learning (MML) is a tempting multidisciplinary research area where heterogeneous data from multiple modalities and (ML) are combined to solve critical problems. Usually, works use single modality, such as images, audio, text, signals. However, real-world issues have become now, handling them using of instead modality can significantly impact finding solutions. ML algorithms play an essential role by tuning parameters in developing MML models. This paper reviews recent advancements the challenges MML, namely: representation, translation, alignment, fusion co-learning, presents gaps challenges. A systematic literature review (SLR) applied define progress trends on those domain. In total, 1032 articles were examined this extract features like source, domain, application, etc. article will help researchers understand constant state navigate selection future directions.
Language: Английский
Citations
34Sensors, Journal Year: 2023, Volume and Issue: 23(12), P. 5475 - 5475
Published: June 9, 2023
Methods for detecting emotions that employ many modalities at the same time have been found to be more accurate and resilient than those rely on a single sense. This is due fact sentiments may conveyed in wide range of modalities, each which offers different complementary window into thoughts speaker. In this way, complete picture person’s emotional state emerge through fusion analysis data from several modalities. The research suggests new attention-based approach multimodal emotion recognition. technique integrates facial speech features extracted by independent encoders order pick aspects are most informative. It increases system’s accuracy processing various sizes focuses useful bits input. A comprehensive representation expressions use both low- high-level features. These combined using network create feature vector then fed classification layer developed system evaluated two datasets, IEMOCAP CMU-MOSEI, shows superior performance compared existing models, achieving weighted WA 74.6% an F1 score 66.1% dataset 80.7% 73.7% CMU-MOSEI dataset.
Language: Английский
Citations
30IEEE Transactions on Affective Computing, Journal Year: 2023, Volume and Issue: 14(4), P. 2595 - 2613
Published: June 5, 2023
Depression
is
a
critical
problem
in
modern
society
that
affects
an
estimated
350
million
people
worldwide,
causing
feelings
of
sadness
and
lack
interest
pleasure.
Emotional
disorders
are
gaining
closely
entwined
with
depression,
because
one
contributes
to
understanding
the
other.
Despite
achievements
two
separate
tasks
emotion
recognition
depression
detection,
there
has
not
been
much
prior
effort
build
unified
model
can
connect
these
different
modalities,
including
multimedia
(text,
audio,
video)
unobtrusive
physiological
signals
(e.g.,
electroencephalography).
We
propose
novel
Language: Английский
Citations
25Sensors, Journal Year: 2023, Volume and Issue: 23(6), P. 2948 - 2948
Published: March 8, 2023
Internet of things (IoT)-enabled wireless body area network (WBAN) is an emerging technology that combines medical devices, and non-medical devices for healthcare management applications. Speech emotion recognition (SER) active research field in the domain machine learning. It a technique can be used to automatically identify speakers' emotions from their speech. However, SER system, especially domain, confronted with few challenges. For example, low prediction accuracy, high computational complexity, delay real-time prediction, how appropriate features Motivated by these gaps, we proposed emotion-aware IoT-enabled WBAN system within framework where data processing long-range transmissions are performed edge AI patients' speech as well capture changes before after treatment. Additionally, investigated effectiveness different learning deep algorithms terms performance classification, feature extraction methods, normalization methods. We developed hybrid model, i.e., convolutional neural (CNN) bidirectional long short-term memory (BiLSTM), regularized CNN model. combined models optimization strategies regularization techniques improve reduce generalization error, complexity networks time, power, space. Different experiments were check efficiency algorithms. The compared related existing model evaluation validation using standard metrics such precision, recall, F1 score, confusion matrix, differences between actual predicted values. experimental results proved one outperformed accuracy about 98%.
Language: Английский
Citations
24Multimedia Systems, Journal Year: 2024, Volume and Issue: 30(3)
Published: April 6, 2024
Abstract In recent years, emotion recognition has received significant attention, presenting a plethora of opportunities for application in diverse fields such as human–computer interaction, psychology, and neuroscience, to name few. Although unimodal methods offer certain benefits, they have limited ability encompass the full spectrum human emotional expression. contrast, Multimodal Emotion Recognition (MER) delivers more holistic detailed insight into an individual's state. However, existing multimodal data collection approaches utilizing contact-based devices hinder effective deployment this technology. We address issue by examining potential contactless techniques MER. our tertiary review study, we highlight unaddressed gaps body literature on Through rigorous analysis MER studies, identify modalities, specific cues, open datasets with unique modality combinations. This further leads us formulation comparative schema mapping requirements given scenario combination. Subsequently, discuss implementation Contactless (CMER) systems use cases help which serves evaluation blueprint. Furthermore, paper also explores ethical privacy considerations concerning employment proposes key principles addressing concerns. The investigates current challenges future prospects field, offering recommendations research development CMER. Our study resource researchers practitioners field recognition, well those intrigued broader outcomes rapidly progressing
Language: Английский
Citations
12Engineering Applications of Artificial Intelligence, Journal Year: 2023, Volume and Issue: 130, P. 107708 - 107708
Published: Dec. 14, 2023
The aim of this paper is to investigate emotion recognition using a multimodal approach that exploits convolutional neural networks (CNNs) with multiple input. Multimodal approaches allow different modalities cooperate in order achieve generally better performances because features are extracted from pieces information. In work, the facial frames, optical flow computed consecutive and Mel Spectrograms (from word melody) videos combined together ways understand which modality combination works better. Several experiments run on models by first considering one at time so good accuracy results found each modality. Afterward, concatenated create final model allows inputs. For datasets used BAUM-1 ((Bahçeşehir University Affective Database - 1) RAVDESS (Ryerson Audio–Visual Emotional Speech Song), both collect two distinguished sets based intensity expression, acted/strong or spontaneous/normal, providing representations following emotional states will be taken into consideration: angry, disgust, fearful, happy sad. proposed shown through some confusion matrices, demonstrating than compared proposals literature. best achieved dataset about 95%, while it 95.5%.
Language: Английский
Citations
18Applied Acoustics, Journal Year: 2024, Volume and Issue: 218, P. 109886 - 109886
Published: Jan. 31, 2024
Language: Английский
Citations
7ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Journal Year: 2024, Volume and Issue: unknown, P. 8030 - 8034
Published: March 18, 2024
In video-based emotion recognition, effective multi-modal fusion techniques are essential to leverage the complementary relationship between audio and visual modalities. Recent attention-based methods widely leveraged for capturing modal-shared properties. However, they often ignore modal-specific properties of modalities unalignment model-shared emotional semantic features. this paper, an Attention Aggregation Network (AttA-NET) is proposed address these challenges. An attention aggregation module get effectively. This comprises similarity-aware enhancement blocks a contrastive loss that facilitates aligning Moreover, auxiliary uni-modal classifier introduced obtain properties, in which intra-modal discriminative features fully extracted. Under joint optimization classification loss, information can be infused. Extensive experiments on RAVDESS PKU-ER datasets validate superiority AttA-NET. The code available at: https://github.com/NariFan2002/AttA-NET.
Language: Английский
Citations
6