UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition DOI Creative Commons
Guimin Hu, Ting-En Lin, Yi Zhao

et al.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Journal Year: 2022, Volume and Issue: unknown

Published: Jan. 1, 2022

Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions the expression of affect or feelings during short period, while sentiments formed held longer period. However, most existing works study separately do not fully exploit complementary knowledge behind two. In this paper, we propose multimodal knowledge-sharing framework (UniMSE) that unifies MSA ERC tasks from features, labels, models. We perform modality fusion at syntactic semantic levels introduce contrastive learning between modalities samples better capture difference consistency emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, IEMOCAP, demonstrate effectiveness proposed method achieve consistent improvements compared with state-of-the-art methods.

Language: Английский

Big Data Deep Learning: Challenges and Perspectives DOI Creative Commons
Xuewen Chen,

Xiaotong Lin

IEEE Access, Journal Year: 2014, Volume and Issue: 2, P. 514 - 525

Published: Jan. 1, 2014

Deep learning is currently an extremely active research area in machine and pattern recognition society. It has gained huge successes a broad of applications such as speech recognition, computer vision, natural language processing. With the sheer size data available today, big brings opportunities transformative potential for various sectors; on other hand, it also presents unprecedented challenges to harnessing information. As keeps getting bigger, deep coming play key role providing predictive analytics solutions. In this paper, we provide brief overview learning, highlight current efforts data, well future trends.

Language: Английский

Citations

1173

Deep Architecture for Traffic Flow Prediction: Deep Belief Networks With Multitask Learning DOI
Wenhao Huang, Guojie Song,

Haikun Hong

et al.

IEEE Transactions on Intelligent Transportation Systems, Journal Year: 2014, Volume and Issue: 15(5), P. 2191 - 2201

Published: April 10, 2014

Traffic flow prediction is a fundamental problem in transportation modeling and management. Many existing approaches fail to provide favorable results due being: 1) shallow architecture; 2) hand engineered features; 3) separate learning. In this paper we propose deep architecture that consists of two parts, i.e., belief network (DBN) at the bottom multitask regression layer top. A DBN employed here for unsupervised feature It can learn effective features traffic an fashion, which has been examined found be many areas such as image audio classification. To best our knowledge, first applies learning approach research. incorporate (MTL) architecture, used above supervised prediction. We further investigate homogeneous MTL heterogeneous take full advantage weight sharing grouping method based on weights top make more effective. Experiments data sets show good performance architecture. Abundant experiments achieved close 5% improvements over state art. also presented improve generalization shared tasks. These positive demonstrate are promising

Language: Английский

Citations

1065

Deep Multimodal Learning: A Survey on Recent Advances and Trends DOI
Dhanesh Ramachandram, Graham W. Taylor

IEEE Signal Processing Magazine, Journal Year: 2017, Volume and Issue: 34(6), P. 96 - 108

Published: Nov. 1, 2017

The success of deep learning has been a catalyst to solving increasingly complex machine-learning problems, which often involve multiple data modalities. We review recent advances in multimodal and highlight the state-of art, as well gaps challenges this active research field. first classify architectures then discuss methods fuse learned representations deep-learning architectures. two areas research-regularization strategies that learn or optimize fusion structures-as exciting for future work.

Language: Английский

Citations

800

End-to-End Multimodal Emotion Recognition Using Deep Neural Networks DOI
Panagiotis Tzirakis, George Trigeorgis, Mihalis A. Nicolaou

et al.

IEEE Journal of Selected Topics in Signal Processing, Journal Year: 2017, Volume and Issue: 11(8), P. 1301 - 1309

Published: Oct. 18, 2017

Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications found in many domains including multimedia retrieval and human computer interaction. In recent years, deep neural networks have been used with great success determining emotional states. Inspired by this success, we propose an emotion system using auditory visual modalities. To capture content for styles of speaking, robust features need extracted. purpose, utilize Convolutional Neural Network (CNN) extract from speech, while modality residual network (ResNet) 50 layers. addition importance feature extraction, machine learning algorithm needs also insensitive outliers being able model context. tackle problem, Long Short-Term Memory (LSTM) are utilized. The then trained end-to-end fashion where - taking advantage correlations each streams manage significantly outperform traditional approaches based on handcrafted prediction spontaneous natural RECOLA database AVEC 2016 research challenge recognition.

Language: Английский

Citations

619

Multimodal Neuroimaging Feature Learning for Multiclass Diagnosis of Alzheimer's Disease DOI
Siqi Liu, Sidong Liu, Weidong Cai

et al.

IEEE Transactions on Biomedical Engineering, Journal Year: 2014, Volume and Issue: 62(4), P. 1132 - 1140

Published: Nov. 20, 2014

The accurate diagnosis of Alzheimer's disease (AD) is essential for patient care and will be increasingly important as modifying agents become available, early in the course disease. Although studies have applied machine learning methods computer-aided AD, a bottleneck diagnostic performance was shown previous methods, due to lacking efficient strategies representing neuroimaging biomarkers. In this study, we designed novel framework with deep architecture aid AD. This uses zero-masking strategy data fusion extract complementary information from multiple modalities. Compared state-of-the-art workflows, our method capable fusing multimodal features one setting has potential require less labeled data. A gain achieved both binary classification multiclass advantages limitations proposed are discussed.

Language: Английский

Citations

549

Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition DOI
Di Wu, Lionel Pigou,

Pieter-Jan Kindermans

et al.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal Year: 2016, Volume and Issue: 38(8), P. 1583 - 1597

Published: March 2, 2016

This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on Hidden Markov Model (HMM) is proposed simultaneous segmentation and recognition where skeleton joint information, depth RGB images, are the input observations. Unlike most traditional approaches that rely construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to modality: Gaussian-Bernouilli Belief Network (DBN) handle skeletal dynamics, 3D Convolutional (3DCNN) manage fuse batches images. achieved through modeling learning emission probabilities HMM required infer sequence. purely data driven achieves Jaccard index score 0.81 in ChaLearn LAP spotting challenge. The performance par with variety state-of-the-art hand-tuned feature-based other learning-based methods, therefore opening door use techniques order further explore time series data.

Language: Английский

Citations

437

Multimodal Classification of Remote Sensing Images: A Review and Future Directions DOI
Luis Gómez‐Chova, Devis Tuia, Gabriele Moser

et al.

Proceedings of the IEEE, Journal Year: 2015, Volume and Issue: 103(9), P. 1560 - 1584

Published: Aug. 13, 2015

Earth observation through remote sensing images allows the accurate characterization and identification of materials on surface from space airborne platforms. Multiple heterogeneous image sources can be available for same geographical region: multispectral, hyperspectral, radar, multitemporal, multiangular today acquired over a given scene. These combined/fused to improve classification surface. Even if this type systems is generally accurate, field about face new challenges: upcoming constellations satellite sensors will acquire large amounts different spatial, spectral, angular, temporal resolutions. In scenario, multimodal fusion stands out as appropriate framework address these problems. paper, we provide taxonomical view review current methodologies images. We also highlight most recent advances, which exploit synergies with machine learning signal processing: sparse methods, kernel-based fusion, Markov modeling, manifold alignment. Then, illustrate approaches in seven challenging applications: 1) multiresolution multispectral classification; 2) downscaling form multitemporal multidimensional interpolation among resolutions; 3) 4) multisensor exploiting physically-based feature extractions; 5) land covers incomplete, inconsistent, vague sources; 6) spatiospectral optical radar change detection; 7) cross-sensor adaptation classifiers. The adoption techniques operational settings help monitor our planet very near future.

Language: Английский

Citations

411

Seven-Point Checklist and Skin Lesion Classification Using Multitask Multimodal Neural Nets DOI
Jeremy Kawahara, Sara Daneshvar, Giuseppe Argenziano

et al.

IEEE Journal of Biomedical and Health Informatics, Journal Year: 2018, Volume and Issue: 23(2), P. 538 - 546

Published: April 9, 2018

We propose a multitask deep convolutional neural network, trained on multimodal data (clinical and dermoscopic images, patient metadata), to classify the 7-point melanoma checklist criteria perform skin lesion diagnosis. Our network is using several loss functions, where each considers different combinations of input modalities, which allows our model be robust missing at inference time. final classifies condition diagnosis, produces feature vectors suitable for image retrieval, localizes clinically discriminant regions. benchmark approach 1011 cases, report comprehensive results over all also make dataset (images metadata) publicly available online http://derm.cs.sfu.ca.

Language: Английский

Citations

400

Methodologies for Cross-Domain Data Fusion: An Overview DOI
Yu Zheng

IEEE Transactions on Big Data, Journal Year: 2015, Volume and Issue: 1(1), P. 16 - 34

Published: March 1, 2015

Traditional data mining usually deals with from a single domain. In the big era, we face diversity of datasets different sources in domains. These consist multiple modalities, each which has representation, distribution, scale, and density. How to unlock power knowledge disparate (but potentially connected) is paramount research, essentially distinguishing traditional tasks. This calls for advanced techniques that can fuse various organically machine learning task. paper summarizes fusion methodologies, classifying them into three categories: stage-based, feature level-based, semantic meaning-based methods. The last category methods further divided four groups: multi-view learning-based, similarity-based, probabilistic dependency-based, transfer learning-based focus on rather than schema mapping merging, significantly between cross-domain studied database community. does not only introduce high-level principles methods, but also give examples these are used handle real problems. addition, this positions existing works framework, exploring relationship difference will help wide range communities find solution projects.

Language: Английский

Citations

387

Real-World Multiobject, Multigrasp Detection DOI Creative Commons
Fu-Jen Chu, Ruinian Xu, Patricio A. Vela

et al.

IEEE Robotics and Automation Letters, Journal Year: 2018, Volume and Issue: 3(4), P. 3355 - 3362

Published: July 4, 2018

A deep learning architecture is proposed to predict graspable locations for robotic manipulation. It considers situations where no, one, or multiple object(s) are seen. By defining the problem be classified with null hypothesis competition instead of regression, neural network red, green, blue and depth (RGB-D) image input predicts grasp candidates a single object objects, in shot. The method outperforms state-of-the-art approaches on Cornell dataset 96.0% 96.1% accuracy imagewise object-wise splits, respectively. Evaluation multiobject illustrates generalization capability architecture. Grasping experiments achieve localization 89.0% grasping success rates test set household objects. real-time process takes less than 0.25 s from plan.

Language: Английский

Citations

373