PLoS ONE,
Journal Year:
2025,
Volume and Issue:
20(5), P. e0322048 - e0322048
Published: May 27, 2025
Creating
a
dataset
for
training
supervised
machine
learning
algorithms
can
be
demanding
task.
This
is
especially
true
blood
vessel
segmentation
since
one
or
more
specialists
are
usually
required
image
annotation,
and
creating
ground
truth
labels
just
single
take
up
to
several
hours.
In
addition,
it
paramount
that
the
annotated
samples
represent
well
different
conditions
might
affect
imaged
tissues
as
possible
changes
in
acquisition
process.
only
achieved
by
considering
typical
atypical,
even
outlier,
samples.
We
introduce
VessMAP,
an
highly
heterogeneous
acquired
carefully
sampling
relevant
images
from
large
non-annotated
containing
fluorescence
microscopy
images.
Each
of
contains
metadata
information
regarding
contrast,
amount
noise,
density,
intensity
variability
vessels.
Prototypical
atypical
were
selected
base
using
available
information,
thus
defining
assorted
set
used
measuring
performance
on
distinct
each
other.
show
datasets
traditionally
developing
new
tend
have
low
heterogeneity.
Thus,
neural
networks
trained
few
four
generalize
all
other
VessMAP
critical
generalization
capability
network.
For
instance,
with
good
contrast
leads
models
poor
inference
quality.
Interestingly,
while
some
sets
lead
Dice
scores
0.59,
careful
selection
results
score
0.85.
development
active
methods
selecting
manual
annotation
analyzing
robustness
distribution
shifts
data.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 16091 - 16101
Published: June 1, 2023
Structural
pruning
enables
model
acceleration
by
removing
structurally-grouped
parameters
from
neural
networks.
However,
the
parameter-grouping
patterns
vary
widely
across
different
models,
making
architecture-specific
pruners,
which
rely
on
manually-designed
grouping
schemes,
non-generalizable
to
new
architectures.
In
this
work,
we
study
a
highly-challenging
yet
barely-explored
task,
any
structural
pruning,
tackle
general
of
arbitrary
architecture
like
CNNs,
RNNs,
GNNs
and
Transformers.
The
most
prominent
obstacle
towards
goal
lies
in
coupling,
not
only
forces
layers
be
pruned
simultaneously,
but
also
expects
all
removed
consistently
unimportant,
thereby
avoiding
issues
significant
performance
degradation
after
pruning.
To
address
problem,
propose
fully
automatic
method,
Dependency
Graph
(DepGraph),
explicitly
dependency
between
comprehensively
group
coupled
for
extensively
evaluate
our
method
several
architectures
tasks,
including
ResNe(X)t,
DenseNet,
MobileNet
Vision
transformer
images,
GAT
graph,
DGCNN
3D
point
cloud,
alongside
LSTM
language,
demonstrate
that,
even
with
simple
norm-based
criterion,
proposed
yields
gratifying
performances.
ACS Catalysis,
Journal Year:
2023,
Volume and Issue:
13(21), P. 13863 - 13895
Published: Oct. 13, 2023
Recent
progress
in
engineering
highly
promising
biocatalysts
has
increasingly
involved
machine
learning
methods.
These
methods
leverage
existing
experimental
and
simulation
data
to
aid
the
discovery
annotation
of
enzymes,
as
well
suggesting
beneficial
mutations
for
improving
known
targets.
The
field
protein
is
gathering
steam,
driven
by
recent
success
stories
notable
other
areas.
It
already
encompasses
ambitious
tasks
such
understanding
predicting
structure
function,
catalytic
efficiency,
enantioselectivity,
dynamics,
stability,
solubility,
aggregation,
more.
Nonetheless,
still
evolving,
with
many
challenges
overcome
questions
address.
In
this
Perspective,
we
provide
an
overview
ongoing
trends
domain,
highlight
case
studies,
examine
current
limitations
learning-based
We
emphasize
crucial
importance
thorough
validation
emerging
models
before
their
use
rational
design.
present
our
opinions
on
fundamental
problems
outline
potential
directions
future
research.
Journal of Information and Intelligence,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 1, 2024
Modern
approach
to
artificial
intelligence
(AI)
aims
design
algorithms
that
learn
directly
from
data.
This
has
achieved
impressive
results
and
contributed
significantly
the
progress
of
AI,
particularly
in
sphere
supervised
deep
learning.
It
also
simplified
machine
learning
systems
as
process
is
highly
automated.
However,
not
all
data
processing
tasks
conventional
pipelines
have
been
In
most
cases
be
manually
collected,
preprocessed
further
extended
through
augmentation
before
they
can
effective
for
training.
Recently,
special
techniques
automating
these
emerged.
The
automation
driven
by
need
utilize
large
volumes
complex,
heterogeneous
big
applications.
Today,
end-to-end
automated
based
on
(AutoML)
are
capable
taking
raw
transforming
them
into
useful
features
Big
Data
intermediate
stages.
this
work,
we
present
a
thorough
review
approaches
pipelines,
including
preprocessing–
e.g.,
cleaning,
labeling,
missing
imputation,
categorical
encoding–as
well
(including
synthetic
generation
using
generative
AI
methods)
feature
engineering–specifically,
extraction,
construction
selection.
addition
specific
tasks,
discuss
use
AutoML
methods
tools
simultaneously
optimize
stages
pipeline.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 3759 - 3768
Published: June 1, 2023
Dataset
distillation,
also
known
as
dataset
condensation,
aims
to
compress
a
large
into
compact
synthetic
one.
Existing
methods
perform
condensation
by
assuming
fixed
storage
or
transmission
budget.
When
the
budget
changes,
however,
they
have
repeat
synthesizing
process
with
access
original
datasets,
which
is
highly
cumbersome
if
not
infeasible
at
all.
In
this
paper,
we
explore
problem
of
slimmable
extract
smaller
given
only
previous
results.
We
first
study
limitations
existing
algorithms
on
such
successive
compression
setting
and
identify
two
key
factors:
(1)
inconsistency
neural
networks
over
different
times
(2)
underdetermined
solution
space
for
data.
Accordingly,
propose
novel
training
objective
explicitly
account
both
factors.
Moreover,
datasets
in
our
method
adopt
significance-aware
parameterization.
Theoretical
derivation
indicates
that
an
upper-bounded
error
can
be
achieved
discarding
minor
components
without
training.
Alternatively,
allowed,
strategy
serve
strong
initialization
enables
fast
convergence.
Extensive
comparisons
ablations
demonstrate
superiority
proposed
multiple
benchmarks.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 1, 2023
Model-based
deep
learning
has
achieved
astounding
successes
due
in
part
to
the
availability
of
large-scale
real-world
data.
However,
processing
such
massive
amounts
data
comes
at
a
considerable
cost
terms
computations,
storage,
training
and
search
for
good
neural
architectures.
Dataset
distillation
thus
recently
come
fore.
This
paradigm
involves
distilling
information
from
large
datasets
into
tiny
compact
synthetic
that
latter
ideally
yields
similar
performances
as
former.
State-of-the-art
methods
primarily
rely
on
dataset
by
matching
gradients
obtained
during
between
real
these
gradient-matching
suffer
so-called
accumulated
trajectory
error
caused
discrepancy
subsequent
evaluation.
To
mitigate
adverse
impact
this
error,
we
propose
novel
approach
encourages
optimization
algorithm
seek
flat
trajectory.
We
show
weights
trained
are
robust
against
errors
perturbations
with
regularization
towards
Our
method,
called
Flat
Trajectory
Distillation
(FTD),
is
shown
boost
performance
up
4.7%
subset
images
ImageNet
higher
resolution
images.
also
validate
effectiveness
generalizability
our
method
different
resolutions
demonstrate
its
applicability
architecture
search.
Code
available
at.
https://github.com/AngusDujw/FTD-distillation.
IEEE Transactions on Artificial Intelligence,
Journal Year:
2023,
Volume and Issue:
5(5), P. 1973 - 1989
Published: Sept. 14, 2023
With
the
exponential
growth
of
computational
power
and
availability
large-scale
datasets
in
recent
years,
remarkable
advancements
have
been
made
field
artificial
intelligence
(AI),
leading
to
complex
models
innovative
applications.
However,
these
consume
a
significant
unprecedented
amount
energy,
contributing
greenhouse
gas
emissions
growing
carbon
footprint
AI
industry.
In
response,
concept
green
has
emerged,
prioritizing
energy
efficiency
sustainability
alongside
accuracy
related
measures.
To
this
end,
data-centric
approaches
are
very
promising
reduce
consumption
algorithms.
This
paper
presents
comprehensive
overview
technologies
their
impact
on
Specifically,
it
focuses
methods
that
utilize
training
data
an
efficient
manner
improve
We
identified
multiple
approaches,
such
as
active
learning,
knowledge
transfer/sharing,
dataset
distillation,
augmentation,
curriculum
learning
can
contribute
development
environmentally-friendly
implementations
machine
Finally,
practical
applications
highlighted,
challenges
future
directions
discussed.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown
Published: June 1, 2023
In
this
paper,
we
explore
a
novel
model
reusing
task
tailored
for
graph
neural
networks
(GNNs),
termed
as
"deep
reprogramming".
We
strive
to
reprogram
pretrained
GNN,
without
amending
raw
node
features
nor
parameters,
handle
bunch
of
cross-level
downstream
tasks
in
various
domains.
To
end,
propose
an
innovative
Data
Reprogramming
paradigm
alongside
Model
paradigm.
The
former
one
aims
address
the
challenge
diversified
feature
dimensions
on
input
side,
while
latter
alleviates
dilemma
fixed
per-task-per-model
behavior
side.
For
data
reprogramming,
specifically
devise
elaborated
Meta-FeatPadding
method
deal
with
heterogeneous
dimensions,
and
also
develop
transductive
Edge-Slimming
well
inductive
Meta-GraPadding
approach
diverse
homogenous
samples.
Meanwhile,
task-adaptive
Reprogrammable-Aggregator,
endow
frozen
larger
expressive
capacities
handling
cross-domain
tasks.
Experiments
fourteen
datasets
across
node/graph
classification/regression,
3D
object
recognition,
distributed
action
demonstrate
that
proposed
methods
yield
gratifying
results,
par
those
by
re-training
from
scratch.
International Journal of Production Research,
Journal Year:
2025,
Volume and Issue:
unknown, P. 1 - 22
Published: Jan. 8, 2025
Machine
learning
(ML)
has
the
potential
to
improve
various
supply
chain
management
(SCM)
tasks,
namely
demand
forecasting,
risk
management,
inventory
production
planning
and
control,
network
reconstruction,
distribution
logistics.
However,
industrial
application
of
ML
in
chains
faces
many
challenges,
particularly
data
privacy
scarcity.
Synthetic
data,
which
is
artificially
generated
mimic
real-world
patterns,
shown
promise
overcoming
similar
challenges
fields
such
as
healthcare
finance.
synthetic
context
remains
limited.
This
publication
aims
analyze
machine
operations
(MLOps)
for
tasks
explain
how
can
address
these
a
context.
Moreover,
identify
suitable
approaches
generate
data.
Based
on
analysis,
research
agenda
proposed
guideline
future
activities
enable
use
chains.