Cited by THFuse: An infrared and visible image fusion network using transformer and hybrid feature extractor

SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer DOI

Jiayi Ma, Linfeng Tang, Fan Fan

et al.

IEEE/CAA Journal of Automatica Sinica, Journal Year: 2022, Volume and Issue: 9(7), P. 1200 - 1217

Published: June 30, 2022

This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided module is devised to achieve sufficient integration of complementary information global interaction. More specifically, proposed method involves intra-domain unit self-attention inter-domain cross-attention, which mine integrate long dependencies within same domain across domains. Through dependency modeling, network able fully implement domain-specific extraction well maintaining appropriate apparent intensity from perspective. In particular, we introduce shifted windows mechanism into allows our model receive images with arbitrary sizes. other multi-scene problems are generalized unified structure maintenance, detail preservation, proper control. Moreover, elaborate loss function, consisting SSIM loss, texture drives preserve abundant details structural information, presenting optimal intensity. Extensive experiments both multi-modal digital photography demonstrate superiority SwinFusion compared state-of-the-art algorithms task-specific alternatives. Implementation code pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.

Language: Английский

Citations

648

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion DOI

Zixiang Zhao, Haowen Bai, Jiangshe Zhang

et al.

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: unknown, P. 5906 - 5916

Published: June 1, 2023

Multi-modality (MM) image fusion aims to render fused images that maintain the merits of different modalities, e.g., functional highlight and detailed textures. To tackle challenge in modeling cross-modality features decomposing desirable modality-specific modality-shared features, we propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. Firstly, CDDFuse uses Restormer blocks extract shallow features. We then introduce dual-branch Transformer-CNN extractor with Lite Transformer (LT) leveraging long-range attention handle low-frequency global Invertible Neural Networks (INN) focusing on extracting high-frequency local information. A correlation-driven loss is further proposed make correlated while uncorrelated based embedded Then, LT-based INN-based layers output image. Extensive experiments demonstrate our achieves promising results multiple tasks, including infrared-visible medical fusion. also show can boost performance downstream semantic segmentation object detection unified benchmark. The code available at https://github.om/haozixiang1228/MMIF-CDDFuse.

Language: Английский

Citations

297

SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness DOI

Linfeng Tang, Yuxin Deng, Yong Ma

et al.

IEEE/CAA Journal of Automatica Sinica, Journal Year: 2022, Volume and Issue: 9(12), P. 2121 - 2137

Published: Dec. 1, 2022

Image fusion aims to integrate complementary information in source images synthesize a fused image comprehensively characterizing the imaging scene. However, existing algorithms are only applicable strictly aligned and cause severe artifacts results when input have slight shifts or deformations. In addition, typically good visual effect, but neglect semantic requirements of high-level vision tasks. This study incorporates registration, fusion, tasks into single framework proposes novel registration method, named SuperFusion. Specifically, we design network estimate bidirectional deformation fields rectify geometric distortions under supervision both photometric end-point constraints. The combined symmetric scheme, which while mutual promotion can be achieved by optimizing naive loss, it is further enhanced mono-modal consistent constraint on outputs. equipped with global spatial attention mechanism achieve adaptive feature integration. Moreover, based pre-trained segmentation model Lovasz-Softmax loss deployed guide focus more Extensive experiments demonstrate superiority our SuperFusion compared state-of-the-art alternatives. code publicly available at https://github.com/Linfeng-Tang/SuperFusion.

Language: Английский

Citations

243

YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer DOI

Wei Tang, Fazhi He, Yü Liu

et al.

IEEE Transactions on Multimedia, Journal Year: 2022, Volume and Issue: 25, P. 5413 - 5428

Published: July 20, 2022

Infrared and visible image fusion is aims to generate a composite that can simultaneously describe the salient target in infrared texture details of same scene. Since deep learning (DL) exhibits great feature extraction ability computer vision tasks, it has also been widely employed handling issue. However, existing DL-based methods generally extract complementary information from source images through convolutional operations, which results limited preservation global features. To this end, we propose novel method, i.e., Y-shape dynamic Transformer (YDTR). Specifically, module (DTRM) designed acquire not only local features but significant context information. Furthermore, proposed network devised comprehensively maintain thermal radiation scene image. Considering specific provided by images, design loss function consists two terms improve quality: structural similarity (SSIM) term spatial frequency (SF) term. Extensive experiments on mainstream datasets illustrate method outperforms both classical state-of-the-art approaches qualitative quantitative assessments. We further extend YDTR address other RGB-visible multi-focus without fine-tuning, satisfactory demonstrate good generalization capability.

Language: Английский

Citations

197

DIVFusion: Darkness-free infrared and visible image fusion DOI

Linfeng Tang, Xinyu Xiang, Hao Zhang

et al.

Information Fusion, Journal Year: 2022, Volume and Issue: 91, P. 477 - 493

Published: Nov. 5, 2022

Language: Английский

Citations

189

DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer DOI

Wei Tang, Fazhi He, Yü Liu

et al.

IEEE Transactions on Circuits and Systems for Video Technology, Journal Year: 2023, Volume and Issue: 33(7), P. 3159 - 3172

Published: Jan. 5, 2023

The fusion of infrared and visible images aims to generate a composite image that can simultaneously contain the thermal radiation information an plentiful texture details detect targets under various weather conditions with high spatial resolution scenes. Previous deep models were generally based on convolutional operations, resulting in limited ability represent long-range context information. In this paper, we propose novel end-to-end model for via dual attention Transformer termed DATFuse. To accurately examine significant areas source images, residual module (DARM) is designed important feature extraction. further dependencies, (TRM) devised global complementary preservation. Moreover, loss function consists three terms, namely, pixel loss, gradient structural train proposed unsupervised manner. This avoid manually designing complicated activity-level measurement strategies traditional methods. Extensive experiments public datasets reveal our DATFuse outperforms other representative state-of-the-art approaches both qualitative quantitative assessments. also extended address tasks without fine-tuning, promising results demonstrate it has good generalization ability. code available at https://github.com/tthinking/DATFuse .

Language: Английский

Citations

159

Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity DOI

Linfeng Tang, Hao Zhang, Han Xu

et al.

Information Fusion, Journal Year: 2023, Volume and Issue: 99, P. 101870 - 101870

Published: June 3, 2023

Language: Английский

Citations

129

Current advances and future perspectives of image fusion: A comprehensive review DOI

Shahid Karim, Geng Tong,

Jinyang Li

et al.

Information Fusion, Journal Year: 2022, Volume and Issue: 90, P. 185 - 217

Published: Sept. 29, 2022

Language: Английский

Citations

128

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion DOI

Zixiang Zhao, Haowen Bai, Yuanzhi Zhu

et al.

2021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2023, Volume and Issue: unknown, P. 8048 - 8059

Published: Oct. 1, 2023

Multi-modality image fusion aims to combine different modalities produce fused images that retain the complementary features of each modality, such as functional highlights and texture details. To leverage strong generative priors address challenges unstable training lack interpretability for GAN-based methods, we propose a novel algorithm based on denoising diffusion probabilistic model (DDPM). The task is formulated conditional generation problem under DDPM sampling framework, which further divided into an unconditional subproblem maximum likelihood subproblem. latter modeled in hierarchical Bayesian manner with latent variables inferred by expectation-maximization (EM) algorithm. By integrating inference solution iteration, our method can generate high-quality natural cross-modality information from source images. Note all required pre-trained model, no fine-tuning needed. Our extensive experiments indicate approach yields promising results infrared-visible medical fusion. code available at https://github.com/Zhaozixiang1228/MMIF-DDFM.

Language: Английский

Citations

114

Visible and Infrared Image Fusion Using Deep Learning DOI

Xingchen Zhang, Yiannis Demiris

IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal Year: 2023, Volume and Issue: 45(8), P. 10535 - 10554

Published: March 30, 2023

Visible and infrared image fusion (VIF) has attracted a lot of interest in recent years due to its application many tasks, such as object detection, tracking, scene segmentation, crowd counting. In addition conventional VIF methods, an increasing number deep learning-based methods have been proposed the last five years. Different types CNN-based, autoencoder-based, GAN-based, transformer-based proposed. Deep undoubtedly become dominant for task. However, while much progress made, field will benefit from systematic review these methods. this paper we present comprehensive We discuss motivation, taxonomy, development characteristics, datasets, performance evaluation detail. also future prospects field. This can serve reference researchers those interested entering fast-developing

Language: Английский

Citations

112