Scientific Reports,
Год журнала:
2024,
Номер
14(1)
Опубликована: Ноя. 8, 2024
The
task
of
image
fusion
for
optical
images
and
SAR
is
to
integrate
valuable
information
from
source
images.
Recently,
owing
powerful
generation,
diffusion
models,
e.g.,
denoising
probabilistic
model
score-based
model,
are
flourished
in
processing,
there
some
effective
attempts
by
scholars'
progressive
explorations.
However,
the
models
suffer
inevitable
speckle
that
seriously
shelters
same
location
image.
Besides,
these
methods
pixel-level
features
without
high-level
tasks,
target
detection
classification,
which
leads
fused
insufficient
their
application
accuracies
low,
tasks.
To
tackle
hurdles,
we
propose
semantic
guided
posterior
sampling
fusion.
Firstly,
employ
SAR-BM3D
as
preprocessing
despeckle.
Then,
established
with
fidelity,
regularization
guidance
term.
first
two
terms
obtained
variational
method
via
inference
first-order
stochastic
optimization.
last
term
served
cross
entropy
loss
between
annotation
classification
result
FLCNet
design.
Finally,
experiments
validate
feasibility
superiority
proposed
on
WHU-OPT-SAR
dataset
DDHRNet
dataset.
IEEE Transactions on Image Processing,
Год журнала:
2025,
Номер
34, С. 1340 - 1353
Опубликована: Янв. 1, 2025
In
this
paper,
we
introduce
MaeFuse,
a
novel
autoencoder
model
designed
for
Infrared
and
Visible
Image
Fusion
(IVIF).
The
existing
approaches
image
fusion
often
rely
on
training
combined
with
downstream
tasks
to
obtain
high-level
visual
information,
which
is
effective
in
emphasizing
target
objects
delivering
impressive
results
quality
task-specific
applications.
Instead
of
being
driven
by
tasks,
our
called
MaeFuse
utilizes
pretrained
encoder
from
Masked
Autoencoders
(MAE),
facilities
the
omni
features
extraction
low-level
reconstruction
vision
perception
friendly
low
cost.
order
eliminate
domain
gap
different
modal
block
effect
caused
MAE
encoder,
further
develop
guided
strategy.
This
strategy
meticulously
crafted
ensure
that
layer
seamlessly
adjusts
feature
space
gradually
enhancing
performance.
proposed
method
can
facilitate
comprehensive
integration
vectors
both
infrared
visible
modalities,
thus
preserving
rich
details
inherent
each
modal.
not
only
introduces
perspective
realm
techniques
but
also
stands
out
performance
across
various
public
datasets.
code
available
at
https://github.com/Henry-Lee-real/MaeFuse.
Scientific Reports,
Год журнала:
2025,
Номер
15(1)
Опубликована: Март 3, 2025
As
an
image
enhancement
technology,
multi-modal
fusion
primarily
aims
to
retain
salient
information
from
multi-source
pairs
in
a
single
image,
generating
imaging
that
contains
complementary
features
and
can
facilitate
downstream
visual
tasks.
However,
dual-stream
methods
with
convolutional
neural
networks
(CNNs)
as
backbone
predominantly
have
limited
receptive
fields,
whereas
Transformers
are
time-consuming,
both
lack
the
exploration
of
cross-domain
information.
This
study
proposes
innovative
model
designed
for
images,
encompassing
infrared
visible
images
medical
images.
Our
leverages
strengths
CNNs
various
feature
types
effectively,
addressing
short-
long-range
learning
well
extraction
low-
high-frequency
features.
First,
our
shared
encoder
is
constructed
based
on
learning,
including
intra-modal
block,
inter-modal
novel
alignment
block
handles
slight
misalignments.
private
extracting
employs
architecture
CNNs,
which
includes
dual-domain
selection
mechanism
invertible
network.
Second,
we
develop
cross-attention-based
Swin
Transformer
explore
In
particular,
introduce
weight
transformation
embedded
into
enhance
efficiency.
Third,
unified
loss
function
incorporating
dynamic
weighting
factor
formulated
capture
inherent
commonalities
A
comprehensive
qualitative
quantitative
analysis
object
detection
experimental
results
demonstrates
proposed
method
effectively
preserves
thermal
targets
background
texture
details,
surpassing
state-of-the-art
alternatives
terms
achieving
high-quality
improving
performance
subsequent