IEEE Transactions on Medical Imaging,
Journal Year:
2024,
Volume and Issue:
43(7), P. 2634 - 2645
Published: March 4, 2024
Quantifying
performance
of
methods
for
tracking
and
mapping
tissue
in
endoscopic
environments
is
essential
enabling
image
guidance
automation
medical
interventions
surgery.
Datasets
developed
so
far
either
use
rigid
environments,
visible
markers,
or
require
annotators
to
label
salient
points
videos
after
collection.
These
are
respectively:
not
general,
algorithms,
costly
error-prone.
We
introduce
a
novel
labeling
methodology
along
with
dataset
that
uses
said
methodology,
Surgical
Tattoos
Infrared
(STIR).
STIR
has
labels
persistent
but
invisible
spectrum
algorithms.
This
done
by
labelling
IR-fluorescent
dye,
indocyanine
green
(ICG),
then
collecting
light
video
clips.
comprises
hundreds
stereo
clips
both
vivo
ex
scenes
start
end
labelled
the
IR
spectrum.
With
over
3,000
points,
will
help
quantify
enable
better
analysis
methods.
After
introducing
STIR,
we
analyze
multiple
different
frame-based
on
using
3D
2D
endpoint
error
accuracy
metrics.
available
at
https://dx.doi.org/10.21227/w8g4-g548.
Nature Biomedical Engineering,
Journal Year:
2023,
Volume and Issue:
7(6), P. 780 - 796
Published: March 30, 2023
Abstract
The
intraoperative
activity
of
a
surgeon
has
substantial
impact
on
postoperative
outcomes.
However,
for
most
surgical
procedures,
the
details
actions,
which
can
vary
widely,
are
not
well
understood.
Here
we
report
machine
learning
system
leveraging
vision
transformer
and
supervised
contrastive
decoding
elements
from
videos
commonly
collected
during
robotic
surgeries.
accurately
identified
steps,
actions
performed
by
surgeon,
quality
these
relative
contribution
individual
video
frames
to
actions.
Through
extensive
testing
data
three
different
hospitals
located
in
two
continents,
show
that
generalizes
across
videos,
surgeons,
it
provide
information
gestures
skills
unannotated
videos.
Decoding
via
accurate
systems
could
be
used
surgeons
with
feedback
their
operating
skills,
may
allow
identification
optimal
behaviour
study
relationships
between
factors
IEEE Access,
Journal Year:
2022,
Volume and Issue:
10, P. 122627 - 122657
Published: Jan. 1, 2022
The
recent
advancements
in
the
surging
field
of
Deep
Learning
(DL)
have
revolutionized
every
sphere
life,
and
healthcare
domain
is
no
exception.
enormous
success
DL
models,
particularly
with
image
data,
has
led
to
development
image-guided
Robot
Assisted
Surgery
(RAS)
systems.
By
large,
number
studies
concerning
image-driven
computer
assisted
surgical
systems
using
increased
exponentially.
Additionally,
contemporary
availability
datasets
also
boosted
applications
RAS.
Inspired
by
latest
trends
contributions
surgery,
this
literature
survey
presents
a
summarized
analysis
innovations
RAS
After
thorough
review,
sum
184
articles
are
selected
grouped
into
four
categories,
based
on
relevancy
task
articles,
comprising
1)
Surgical
Tools,
2)
Processes,
3)
Surveillance,
4)
Performance.
discusses
publicly
available
highlights
basics
models.
Furthermore,
legal,
ethical,
technological
challenges
together
intuitive
predictions
recommendations
related
autonomous
presented.
study
reveals
that
Convolutional
Neural
Network
(CNN)
most
widely
adopted
architecture,
whereas,
JIGSAWS
employed
dataset
suggests
fusing
kinematic
data
along
which
produces
better
accuracy
precision,
gesture
trajectory
segmentation
tasks.
CNN
Long
Short
Term
Memory
(LSTM)
networks
shown
remarkable
performance,
however,
authors
recommend
employing
these
gigantic
architectures
only
when
simpler
models
failed
produce
satisfactory
results.
despite
their
limitations,
time
cost
effective
yield
considerable
outcomes
even
smaller
datasets.
Artificial Intelligence Review,
Journal Year:
2024,
Volume and Issue:
57(7)
Published: June 19, 2024
Abstract
Vision-based
Human
Action
Recognition
(HAR)
is
a
hot
topic
in
computer
vision.
Recently,
deep-based
HAR
has
shown
promising
results.
using
single
data
modality
common
approach;
however,
the
fusion
of
different
sources
essentially
conveys
complementary
information
and
improves
This
paper
comprehensively
reviews
methods
multiple
visual
modalities.
The
main
contribution
this
categorizing
existing
into
four
levels,
which
provides
an
in-depth
comparable
analysis
approaches
various
aspects.
So,
at
first
level,
proposed
are
categorized
based
on
employed
At
second
level
classified
employment
complete
modalities
or
working
with
missing
test
time.
third
branches
approaches.
Finally,
similar
frameworks
category
grouped
together.
In
addition,
comprehensive
comparison
provided
for
publicly
available
benchmark
datasets,
helps
to
compare
choose
suitable
datasets
task
develop
new
datasets.
also
compares
performance
state-of-the-art
review
concludes
by
highlighting
several
future
directions.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Journal Year:
2023,
Volume and Issue:
unknown, P. 2384 - 2393
Published: June 1, 2023
Most
state-of-the-art
methods
for
action
segmentation
are
based
on
single
input
modalities
or
naïve
fusion
of
multiple
data
sources.
However,
effective
complementary
information
can
potentially
strengthen
models
and
make
them
more
robust
to
sensor
noise
accurate
with
smaller
training
datasets.
In
order
improve
multimodal
representation
learning
segmentation,
we
propose
disentangle
hidden
features
a
multi-stream
model
into
modality-shared
components,
containing
common
across
sources,
private
components;
then
use
an
attention
bottleneck
capture
long-range
temporal
dependencies
in
the
while
preserving
disentanglement
consecutive
processing
layers.
Evaluation
50salads,
Breakfast
RARP45
datasets
shows
that
our
approach
outperforms
different
baselines
both
multiview
obtaining
competitive
better
results
compared
state-of-the-art.
Our
is
also
additive
achieve
performance
par
strong
video
even
less
data.
npj Digital Medicine,
Journal Year:
2025,
Volume and Issue:
8(1)
Published: Jan. 14, 2025
Abstract
This
systematic
review
explores
machine
learning
(ML)
applications
in
surgical
motion
analysis
using
non-optical
tracking
systems
(NOMTS),
alone
or
with
optical
methods.
It
investigates
objectives,
experimental
designs,
model
effectiveness,
and
future
research
directions.
From
3632
records,
84
studies
were
included,
Artificial
Neural
Networks
(38%)
Support
Vector
Machines
(11%)
being
the
most
common
ML
models.
Skill
assessment
was
primary
objective
(38%).
NOMTS
used
included
internal
device
kinematics
(56%),
electromagnetic
(17%),
inertial
(15%),
mechanical
(11%),
electromyography
(1%)
sensors.
Surgical
settings
robotic
(60%),
laparoscopic
(18%),
open
(16%),
others
(6%).
Procedures
focused
on
bench-top
tasks
(67%),
clinical
models
simulations
(9%),
non-clinical
(7%).
Over
90%
accuracy
achieved
36%
of
studies.
Literature
shows
can
enhance
precision,
assessment,
training.
Future
should
advance
environments,
ensure
interpretability
reproducibility,
use
larger
datasets
for
accurate
evaluation.
Frontiers in Neurorobotics,
Journal Year:
2025,
Volume and Issue:
18
Published: Jan. 20, 2025
Introduction
Accurate
recognition
of
martial
arts
leg
poses
is
essential
for
applications
in
sports
analytics,
rehabilitation,
and
human-computer
interaction.
Traditional
pose
models,
relying
on
sequential
or
convolutional
approaches,
often
struggle
to
capture
the
complex
spatial-temporal
dependencies
inherent
movements.
These
methods
lack
ability
effectively
model
nuanced
dynamics
joint
interactions
temporal
progression,
leading
limited
generalization
recognizing
actions.
Methods
To
address
these
challenges,
we
propose
PoseGCN,
a
Graph
Convolutional
Network
(GCN)-based
that
integrates
spatial,
temporal,
contextual
features
through
novel
framework.
PoseGCN
leverages
graph
encoding
motion
dynamics,
an
action-specific
attention
mechanism
assign
importance
relevant
joints
depending
action
context,
self-supervised
pretext
task
enhance
robustness
continuity.
Experimental
results
four
benchmark
datasets—Kinetics-700,
Human3.6M,
NTU
RGB+D,
UTD-MHAD—demonstrate
outperforms
existing
achieving
state-of-the-art
accuracy
F1
scores.
Results
discussion
findings
highlight
model's
capacity
generalize
across
diverse
datasets
fine-grained
details,
showcasing
its
potential
advancing
tasks.
The
proposed
framework
offers
robust
solution
precise
paves
way
future
developments
multi-modal
analysis.
UroPrecision,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 18, 2025
Abstract
As
surgical
training
shifts
from
a
traditional
method
to
more
standardized
approach,
objective
analysis
and
assessment
of
surgeon
performance
has
become
key
focus.
Surgical
gestures,
defined
as
the
smallest
independent
units
instrument‐tissue
interaction,
offer
quantifiable
way
analyze
performance.
Standardizing
terminology
for
describing
gestures
can
enhance
communication
during
in
operating
room.
More
importantly,
gesture
usage
been
linked
expertise
shown
be
associated
with
patient
outcomes.
This
review
examines
current
classification
systems
dissection
suturing
tasks,
across
open,
laparoscopic,
robotic
procedures,
which
serve
an
armamentarium
surgeons.
It
also
explores
how
complement
conventional
tools.
Finally,
it
reviews
artificial
intelligent
models
on
recognition
automation,
envisions
future
where
forms
foundation
assistance