UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition DOI Creative Commons
Guimin Hu, Ting-En Lin, Yi Zhao

et al.

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Journal Year: 2022, Volume and Issue: unknown

Published: Jan. 1, 2022

Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions the expression of affect or feelings during short period, while sentiments formed held longer period. However, most existing works study separately do not fully exploit complementary knowledge behind two. In this paper, we propose multimodal knowledge-sharing framework (UniMSE) that unifies MSA ERC tasks from features, labels, models. We perform modality fusion at syntactic semantic levels introduce contrastive learning between modalities samples better capture difference consistency emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, IEMOCAP, demonstrate effectiveness proposed method achieve consistent improvements compared with state-of-the-art methods.

Language: Английский

A Hierarchical Fused Fuzzy Deep Neural Network for Data Classification DOI
Yue Deng,

Zhiquan Ren,

Youyong Kong

et al.

IEEE Transactions on Fuzzy Systems, Journal Year: 2016, Volume and Issue: 25(4), P. 1006 - 1012

Published: June 2, 2016

Deep learning (DL) is an emerging and powerful paradigm that allows large-scale task-driven feature from big data. However, typical DL a fully deterministic model sheds no light on data uncertainty reductions. In this paper, we show how to introduce the concepts of fuzzy into overcome shortcomings fixed representation. The bulk proposed system hierarchical deep neural network derives information both representations. Then, knowledge learnt these two respective views are fused altogether forming final representation be classified. effectiveness verified three practical tasks image categorization, high-frequency financial prediction brain MRI segmentation all contain high level uncertainties in raw dDL greatly outperforms other nonfuzzy shallow approaches tasks.

Language: Английский

Citations

339

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications DOI
Chao Zhang, Zichao Yang, Xiaodong He

et al.

IEEE Journal of Selected Topics in Signal Processing, Journal Year: 2020, Volume and Issue: 14(3), P. 478 - 493

Published: March 1, 2020

Deep learning methods have revolutionized speech recognition, image and natural language processing since 2010. Each of these tasks involves a single modality in their input signals. However, many applications the artificial intelligence field involve multiple modalities. Therefore, it is broad interest to study more difficult complex problem modeling across In this paper, we provide technical review available models for multimodal intelligence. The main focus combination vision modalities, which has become an important topic both computer research communities. This provides comprehensive analysis recent works on deep from three perspectives: representations, fusing signals at various levels, applications. Regarding representation learning, key concepts embedding, unify into vector space thereby enable cross-modality signal processing. We also properties types embeddings that are constructed learned general downstream tasks. fusion, focuses special architectures integration representations unimodal particular task. applications, selected areas current literature covered, including image-to-text caption generation, text-to-image visual question answering. believe will facilitate future studies emerging related

Language: Английский

Citations

316

ModDrop: Adaptive Multi-Modal Gesture Recognition DOI
Natalia Neverova, Christian Wolf, Graham W. Taylor

et al.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal Year: 2015, Volume and Issue: 38(8), P. 1692 - 1706

Published: July 28, 2015

We present a method for gesture detection and localisation based on multi-scale multi-modal deep learning. Each visual modality captures spatial information at particular scale (such as motion of the upper body or hand), whole system operates three temporal scales. Key to our technique is training strategy which exploits: i) careful initialization individual modalities; ii) gradual fusion involving random dropping separate channels (dubbed ModDrop) learning cross-modality correlations while preserving uniqueness each modality-specific representation. experiments ChaLearn 2014 Looking People Challenge recognition track, in we placed first out 17 teams. Fusing multiple modalities several scales leads significant increase rates, allowing model compensate errors classifiers well noise channels. Futhermore, proposed ModDrop ensures robustness classifier missing signals one produce meaningful predictions from any number available modalities. In addition, demonstrate applicability scheme arbitrary nature by same dataset augmented with audio.

Language: Английский

Citations

315

Transferable Attention for Domain Adaptation DOI Open Access
Ximei Wang, Liang Li, Weirui Ye

et al.

Proceedings of the AAAI Conference on Artificial Intelligence, Journal Year: 2019, Volume and Issue: 33(01), P. 5345 - 5352

Published: July 17, 2019

Recent work in domain adaptation bridges different domains by adversarially learning a domain-invariant representation that cannot be distinguished discriminator. Existing methods of adversarial mainly align the global images across source and target domains. However, it is obvious not all regions an image are transferable, while forcefully aligning untransferable may lead to negative transfer. Furthermore, some significantly dissimilar domains, resulting weak image-level transferability. To this end, we present Transferable Attention for Domain Adaptation (TADA), focusing our model on transferable or images. We implement two types complementary attention: local attention generated multiple region-level discriminators highlight regions, single discriminator Extensive experiments validate proposed models exceed state art results standard datasets.

Language: Английский

Citations

303

A Survey on Multiview Clustering DOI Creative Commons
Guoqing Chao, Shiliang Sun, Jinbo Bi

et al.

IEEE Transactions on Artificial Intelligence, Journal Year: 2021, Volume and Issue: 2(2), P. 146 - 168

Published: April 1, 2021

Clustering is a machine learning paradigm of dividing sample subjects into number groups such that in the same are more similar to those other groups. With advances information acquisition technologies, samples can frequently be viewed from different angles or modalities, generating multi-view data. Multi-view clustering, clusters subgroups using data, has attracted and attentions. Although MVC methods have been developed rapidly, there not enough survey summarize analyze current progress. Therefore, we propose novel taxonomy approaches. Similar methods, categorize them generative discriminative classes. In class, based on way view integration, split it further five groups: Common Eigenvector Matrix, Coefficient Indicator Direct Combination After Projection. Furthermore, relate topics: representation, ensemble multi-task supervised semi-supervised learning. Several representative real-world applications elaborated for practitioners. Some benchmark datasets introduced algorithms each group empirically evaluated how they perform datasets. To promote future development approaches, point out several open problems may require investigation thorough examination.

Language: Английский

Citations

263

A Robust Deep Model for Improved Classification of AD/MCI Patients DOI
Feng Li, Loc Tran, Kim‐Han Thung

et al.

IEEE Journal of Biomedical and Health Informatics, Journal Year: 2015, Volume and Issue: 19(5), P. 1610 - 1616

Published: May 4, 2015

Accurate classification of Alzheimer's disease (AD) and its prodromal stage, mild cognitive impairment (MCI), plays a critical role in possibly preventing progression memory improving quality life for AD patients. Among many research tasks, it is particular interest to identify noninvasive imaging biomarkers diagnosis. In this paper, we present robust deep learning system different stages patients based on MRI PET scans. We utilized the dropout technique improve classical by weight coadaptation, which typical cause overfitting learning. addition, incorporated stability selection, an adaptive factor, multitask strategy into framework. applied proposed method ADNI dataset, conducted experiments MCI conversion Experimental results showed that very effective diagnosis, accuracies 5.9% average as compared methods.

Language: Английский

Citations

256

Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach DOI
Muxuan Liang, Zhizhong Li, Ting Chen

et al.

IEEE/ACM Transactions on Computational Biology and Bioinformatics, Journal Year: 2014, Volume and Issue: 12(4), P. 928 - 937

Published: Dec. 6, 2014

Identification of cancer subtypes plays an important role in revealing useful insights into disease pathogenesis and advancing personalized therapy. The recent development high-throughput sequencing technologies has enabled the rapid collection multi-platform genomic data (e.g., gene expression, miRNA DNA methylation) for same set tumor samples. Although numerous integrative clustering approaches have been developed to analyze data, few them are particularly designed exploit both deep intrinsic statistical properties each input modality complex cross-modality correlations among data. In this paper, we propose a new machine learning model, called multimodal belief network (DBN), cluster patients from observation our framework, relationships inherent features single first encoded multiple layers hidden variables, then joint latent model is employed fuse common derived modalities. A practical algorithm, contrastive divergence (CD), applied infer parameters DBN unsupervised manner. Tests on two available datasets show that analysis approach can effectively extract unified representation capture intra- correlations, identify meaningful addition, key genes miRNAs may play distinct roles different subtypes. Among those miRNAs, found expression level miR-29a highly correlated with survival time ovarian patients. These results indicate based applications studies provide guidelines

Language: Английский

Citations

238

CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network DOI
Yuxin Peng,

Jinwei Qi,

Xin Huang

et al.

IEEE Transactions on Multimedia, Journal Year: 2017, Volume and Issue: 20(2), P. 405 - 420

Published: Aug. 21, 2017

Cross-modal retrieval has become a highlighted research topic for across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on deep neural network (DNN): The first stage to generate separate representation each modality xmlns:xlink="http://www.w3.org/1999/xlink">the second get the cross-modal common representation. However have three limitations: 1) In they only model intramodality correlation but ignore intermodality with rich complementary context. 2) adopt shallow networks single-loss regularization intrinsic relevance of correlation. 3) Only original instances are considered while fine-grained clues provided their patches ignored. For addressing above problems this paper proposes (CCL) approach multigrained fusion hierarchical contributions follows: CCL exploits multilevel association joint optimization preserve context from simultaneously. multitask strategy designed adaptively balance semantic category constraints pairwise similarity constraints. adopts modeling which fuses coarse-grained make more precise. Comparing 13 state-of-the-art 6 widely-used datasets experimental results show our achieves best performance.

Language: Английский

Citations

224

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks DOI
Jen-Cheng Hou, Syu‐Siang Wang, Ying-Hui Lai

et al.

IEEE Transactions on Emerging Topics in Computational Intelligence, Journal Year: 2018, Volume and Issue: 2(2), P. 117 - 128

Published: March 23, 2018

Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus only on addressing audio information. In this paper, inspired by multimodal learning, which utilizes data from different modalities, and the recent success of convolutional neural networks (CNNs) SE, we propose an audio-visual deep CNNs (AVDCNN) model, incorporates visual streams into a unified network model. We also multitask learning framework for reconstructing signals at output layer. Precisely speaking, proposed AVDCNN model is structured as encoder-decoder network, are first processed using individual CNNs, then fused joint generate enhanced (the primary task) reconstructed images secondary The trained endto-end manner, parameters jointly learned through back propagation. evaluate five instrumental criteria. Results show that yields notably superior performance compared with audio-only CNN-based two conventional approaches, confirming effectiveness integrating information process. addition, outperforms existing audio- its capability effectively combining SE.

Language: Английский

Citations

218

A Multi-View Deep Learning Framework for EEG Seizure Detection DOI
Ye Yuan, Guangxu Xun, Kebin Jia

et al.

IEEE Journal of Biomedical and Health Informatics, Journal Year: 2018, Volume and Issue: 23(1), P. 83 - 94

Published: Sept. 24, 2018

The recent advances in pervasive sensing technologies have enabled us to monitor and analyze the multi-channel electroencephalogram (EEG) signals of epilepsy patients prevent serious outcomes caused by epileptic seizures. To avoid manual visual inspection from long-term EEG readings, automatic seizure detection has garnered increasing attention among researchers. In this paper, we present a unified multi-view deep learning framework capture brain abnormalities associated with seizures based on scalp signals. proposed approach is an end-to-end model that able jointly learn features both unsupervised reconstruction supervised via spectrogram representation. We construct new autoencoder-based incorporating inter intra correlations channels unleash power information. By adding channel-wise competition mechanism training phase, propose channel-aware module guide our structure focus important relevant channels. validate effectiveness framework, extensive experiments against nine baselines, including traditional handcrafted feature extraction conventional methods, are carried out benchmark dataset. Experimental results show achieve higher average accuracy f1-score at 94.37% 85.34%, respectively, using 5-fold subject-independent cross validation, demonstrating powerful effective method task detection.

Language: Английский

Citations

218