Neurocomputing, Journal Year: 2023, Volume and Issue: 527, P. 71 - 82
Published: Jan. 12, 2023
Language: Английский
Neurocomputing, Journal Year: 2023, Volume and Issue: 527, P. 71 - 82
Published: Jan. 12, 2023
Language: Английский
IEEE/CAA Journal of Automatica Sinica, Journal Year: 2022, Volume and Issue: 9(7), P. 1200 - 1217
Published: June 30, 2022
This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided module is devised to achieve sufficient integration of complementary information global interaction. More specifically, proposed method involves intra-domain unit self-attention inter-domain cross-attention, which mine integrate long dependencies within same domain across domains. Through dependency modeling, network able fully implement domain-specific extraction well maintaining appropriate apparent intensity from perspective. In particular, we introduce shifted windows mechanism into allows our model receive images with arbitrary sizes. other multi-scene problems are generalized unified structure maintenance, detail preservation, proper control. Moreover, elaborate loss function, consisting SSIM loss, texture drives preserve abundant details structural information, presenting optimal intensity. Extensive experiments both multi-modal digital photography demonstrate superiority SwinFusion compared state-of-the-art algorithms task-specific alternatives. Implementation code pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.
Language: Английский
Citations
6482022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Journal Year: 2023, Volume and Issue: unknown, P. 5906 - 5916
Published: June 1, 2023
Multi-modality (MM) image fusion aims to render fused images that maintain the merits of different modalities, e.g., functional highlight and detailed textures. To tackle challenge in modeling cross-modality features decomposing desirable modality-specific modality-shared features, we propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. Firstly, CDDFuse uses Restormer blocks extract shallow features. We then introduce dual-branch Transformer-CNN extractor with Lite Transformer (LT) leveraging long-range attention handle low-frequency global Invertible Neural Networks (INN) focusing on extracting high-frequency local information. A correlation-driven loss is further proposed make correlated while uncorrelated based embedded Then, LT-based INN-based layers output image. Extensive experiments demonstrate our achieves promising results multiple tasks, including infrared-visible medical fusion. also show can boost performance downstream semantic segmentation object detection unified benchmark. The code available at https://github.om/haozixiang1228/MMIF-CDDFuse.
Language: Английский
Citations
297IEEE/CAA Journal of Automatica Sinica, Journal Year: 2022, Volume and Issue: 9(12), P. 2121 - 2137
Published: Dec. 1, 2022
Image fusion aims to integrate complementary information in source images synthesize a fused image comprehensively characterizing the imaging scene. However, existing algorithms are only applicable strictly aligned and cause severe artifacts results when input have slight shifts or deformations. In addition, typically good visual effect, but neglect semantic requirements of high-level vision tasks. This study incorporates registration, fusion, tasks into single framework proposes novel registration method, named SuperFusion. Specifically, we design network estimate bidirectional deformation fields rectify geometric distortions under supervision both photometric end-point constraints. The combined symmetric scheme, which while mutual promotion can be achieved by optimizing naive loss, it is further enhanced mono-modal consistent constraint on outputs. equipped with global spatial attention mechanism achieve adaptive feature integration. Moreover, based pre-trained segmentation model Lovasz-Softmax loss deployed guide focus more Extensive experiments demonstrate superiority our SuperFusion compared state-of-the-art alternatives. code publicly available at https://github.com/Linfeng-Tang/SuperFusion.
Language: Английский
Citations
243IEEE Transactions on Multimedia, Journal Year: 2022, Volume and Issue: 25, P. 5413 - 5428
Published: July 20, 2022
Infrared and visible image fusion is aims to generate a composite that can simultaneously describe the salient target in infrared texture details of same scene. Since deep learning (DL) exhibits great feature extraction ability computer vision tasks, it has also been widely employed handling issue. However, existing DL-based methods generally extract complementary information from source images through convolutional operations, which results limited preservation global features. To this end, we propose novel method, i.e., Y-shape dynamic Transformer (YDTR). Specifically, module (DTRM) designed acquire not only local features but significant context information. Furthermore, proposed network devised comprehensively maintain thermal radiation scene image. Considering specific provided by images, design loss function consists two terms improve quality: structural similarity (SSIM) term spatial frequency (SF) term. Extensive experiments on mainstream datasets illustrate method outperforms both classical state-of-the-art approaches qualitative quantitative assessments. We further extend YDTR address other RGB-visible multi-focus without fine-tuning, satisfactory demonstrate good generalization capability.
Language: Английский
Citations
197Information Fusion, Journal Year: 2022, Volume and Issue: 91, P. 477 - 493
Published: Nov. 5, 2022
Language: Английский
Citations
189IEEE Transactions on Circuits and Systems for Video Technology, Journal Year: 2023, Volume and Issue: 33(7), P. 3159 - 3172
Published: Jan. 5, 2023
The
fusion
of
infrared
and
visible
images
aims
to
generate
a
composite
image
that
can
simultaneously
contain
the
thermal
radiation
information
an
plentiful
texture
details
detect
targets
under
various
weather
conditions
with
high
spatial
resolution
scenes.
Previous
deep
models
were
generally
based
on
convolutional
operations,
resulting
in
limited
ability
represent
long-range
context
information.
In
this
paper,
we
propose
novel
end-to-end
model
for
via
dual
attention
Transformer
termed
DATFuse.
To
accurately
examine
significant
areas
source
images,
residual
module
(DARM)
is
designed
important
feature
extraction.
further
dependencies,
(TRM)
devised
global
complementary
preservation.
Moreover,
loss
function
consists
three
terms,
namely,
pixel
loss,
gradient
structural
train
proposed
unsupervised
manner.
This
avoid
manually
designing
complicated
activity-level
measurement
strategies
traditional
methods.
Extensive
experiments
public
datasets
reveal
our
DATFuse
outperforms
other
representative
state-of-the-art
approaches
both
qualitative
quantitative
assessments.
also
extended
address
tasks
without
fine-tuning,
promising
results
demonstrate
it
has
good
generalization
ability.
code
available
at
Language: Английский
Citations
159Information Fusion, Journal Year: 2023, Volume and Issue: 99, P. 101870 - 101870
Published: June 3, 2023
Language: Английский
Citations
129Information Fusion, Journal Year: 2022, Volume and Issue: 90, P. 185 - 217
Published: Sept. 29, 2022
Language: Английский
Citations
1282021 IEEE/CVF International Conference on Computer Vision (ICCV), Journal Year: 2023, Volume and Issue: unknown, P. 8048 - 8059
Published: Oct. 1, 2023
Multi-modality image fusion aims to combine different modalities produce fused images that retain the complementary features of each modality, such as functional highlights and texture details. To leverage strong generative priors address challenges unstable training lack interpretability for GAN-based methods, we propose a novel algorithm based on denoising diffusion probabilistic model (DDPM). The task is formulated conditional generation problem under DDPM sampling framework, which further divided into an unconditional subproblem maximum likelihood subproblem. latter modeled in hierarchical Bayesian manner with latent variables inferred by expectation-maximization (EM) algorithm. By integrating inference solution iteration, our method can generate high-quality natural cross-modality information from source images. Note all required pre-trained model, no fine-tuning needed. Our extensive experiments indicate approach yields promising results infrared-visible medical fusion. code available at https://github.com/Zhaozixiang1228/MMIF-DDFM.
Language: Английский
Citations
114IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal Year: 2023, Volume and Issue: 45(8), P. 10535 - 10554
Published: March 30, 2023
Visible and infrared image fusion (VIF) has attracted a lot of interest in recent years due to its application many tasks, such as object detection, tracking, scene segmentation, crowd counting. In addition conventional VIF methods, an increasing number deep learning-based methods have been proposed the last five years. Different types CNN-based, autoencoder-based, GAN-based, transformer-based proposed. Deep undoubtedly become dominant for task. However, while much progress made, field will benefit from systematic review these methods. this paper we present comprehensive We discuss motivation, taxonomy, development characteristics, datasets, performance evaluation detail. also future prospects field. This can serve reference researchers those interested entering fast-developing
Language: Английский
Citations
112