medRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 10, 2024
Abstract
Background
Echocardiograms
provide
vital
insights
into
cardiac
health,
but
their
complex,
multi-dimensional
data
presents
challenges
for
analysis
and
interpretation.
Current
deep
learning
models
echocardiogram
often
rely
on
supervised
training,
limiting
generalizability
robustness
across
datasets
clinical
environments.
Objective
To
develop
evaluate
EchoVisionFM
(
E
chocardiogram
video
Vision
F
oundation
M
odel),
a
self-supervised
framework
designed
to
pre-train
encoder
large-scale,
unlabeled
data.
aims
produce
robust
transferrable
spatiotemporal
representations,
improving
downstream
performance
diverse
conditions.
Methods
Our
employs
Echo-VideoMAE,
an
autoencoder-based
transformer
that
compresses
reconstructs
by
masking
non-overlapping
patches
leveraging
ViT
encoder-decoder
structure.
For
enhanced
representation,
we
introduce
STFF-Net
,
S
patio
T
emporal
eature
usion
Net
work,
integrate
spatial
temporal
features
from
the
manifold
representations.
We
pre-trained
using
MIMIC-IV-ECHO
dataset
fine-tuned
it
EchoNet-Dynamic
tasks,
including
classification
regression
of
key
parameters.
Results
demonstrated
superior
in
classifying
left
ventricular
ejection
fraction
(LVEF),
achieving
accuracy
89.12%,
F1
score
0.9323,
AUC
0.9364.
In
outperformed
state-of-the-art
models,
with
LVEF
prediction
reaching
mean
absolute
error
(MAE)
4.18%
R
2
0.8022.
The
model
also
showed
significant
improvements
estimating
end-systolic
end-diastolic
volumes,
values
0.8006
0.7296,
respectively.
Incorporating
led
further
gains
tasks.
Conclusion
results
indicate
large-scale
pre-training
videos
enables
extraction
transferable
clinically
relevant
features,
outperforming
traditional
CNN-based
methods.
framework,
particularly
STFF-Net,
enhances
predictive
various
offers
powerful,
scalable
approach
analysis,
potential
applications
diagnostics
research.
Sensors,
Journal Year:
2025,
Volume and Issue:
25(4), P. 1048 - 1048
Published: Feb. 10, 2025
Traditional
approaches
for
human
monitoring
and
motion
recognition
often
rely
on
wearable
sensors,
which,
while
effective,
are
obtrusive
cause
significant
discomfort
to
workers.
More
recent
have
employed
unobtrusive,
real-time
sensing
using
cameras
mounted
in
the
manufacturing
environment.
While
these
methods
generate
large
volumes
of
rich
data,
they
require
extensive
labeling
analysis
machine
learning
applications.
Additionally,
frequently
capture
irrelevant
environmental
information,
which
can
hinder
performance
deep
algorithms.
To
address
limitations,
this
paper
introduces
a
novel
framework
that
leverages
contrastive
approach
learn
representations
from
raw
images
without
need
manual
labeling.
This
mitigates
effect
complexity
by
focusing
critical
joint
coordinates
relevant
tasks.
ensures
model
learns
directly
human-specific
effectively
reducing
impact
surrounding
A
custom
dataset
subjects
simulating
various
tasks
workplace
setting
is
used
training
evaluation.
By
fine-tuning
learned
downstream
classification
task,
we
achieve
up
90%
accuracy,
demonstrating
effectiveness
our
proposed
solution
monitoring.
Computer-Aided Civil and Infrastructure Engineering,
Journal Year:
2024,
Volume and Issue:
39(13), P. 2028 - 2053
Published: Jan. 29, 2024
Abstract
Single‐stage
activity
recognition
methods
have
been
gaining
popularity
within
the
construction
domain.
However,
their
low
per‐frame
accuracy
necessitates
additional
post‐processing
to
link
detections.
Therefore,
limiting
real‐time
monitoring
capabilities
is
an
indispensable
component
of
emerging
digital
twins.
This
study
proposes
knowledge
DIstillation
temporal
Gradient
data
for
Entity
Recognition
(DIGER),
built
upon
you
only
watch
once
(YOWO)
method
and
improving
its
localization
performance.
Activity
improved
by
designing
auxiliary
backbone
exploit
complementary
information
in
gradient
(transferred
into
YOWO
using
distillation),
while
primarily
through
integration
complete
intersection
over
union
loss.
DIGER
achieved
a
93.6%
mean
average
precision
at
50%
79.8%
on
large
custom
dataset,
outperforming
state‐of‐the‐art
without
requiring
computation
during
inference,
making
it
highly
effective
site
activities.
Safety
inspections
on
construction
sites
are
critical
for
accident
prevention.
Traditionally,
inspectors
document
violations
using
photos
and
textual
descriptions,
but
this
process
is
time-consuming
inconsistent.
Studies
sought
to
enhance
efficiency
with
standardized
forms
image
captioning
techniques.
However,
streamlining
the
compiling
reports
effectively
remains
challenging.
We
propose
an
image-language
model
that
automatically
generates
safety
observations
through
CLIP
fine-tuning
prefix
tailored
safety.
In
addition,
attention
map
of
predicted
captions
will
be
generated
obtain
reasoning
between
violation
in
images
text.
The
can
successfully
classify
nine
types
at
average
accuracy
73.7%
outperform
baseline
caption
by
41.8%.
proposed
framework
integrated
a
mobile
phone
application
inspection
real-world
scenarios,
which
supports
documenting
generating
effectively.
Journal of Information Technology in Construction,
Journal Year:
2023,
Volume and Issue:
28, P. 458 - 481
Published: Aug. 25, 2023
Recognising
activities
of
construction
equipment
is
essential
for
monitoring
productivity,
progress,
safety,
and
environmental
impacts.
While
there
have
been
many
studies
on
activity
recognition
earth
excavation
moving
equipment,
identification
Automated
Construction
Systems
(ACS)
has
rarely
attempted.
Especially
low-rise
ACS
that
offers
energy-efficient,
cost-effective
solutions
urgent
housing
needs,
provides
more
affordable
living
options
a
broader
population.
Deep
learning
methods
gained
lot
attention
because
their
ability
to
perform
classification
without
manually
extracting
relevant
features.
This
study
evaluates
the
feasibility
deep
sequence
models
developing
an
framework
automated
equipment.
Time
series
acceleration
data
was
collected
from
structure
identify
major
operation
classes
ACS.
Long
Short
Term
Memory
Networks
(LSTM)
were
applied
identifying
performance
compared
with
traditional
machine
classifiers.
Diverse
augmentation
adopted
generating
datasets
training
Several
recently
published
literature
seem
establish
superiority
complex
techniques
over
algorithms
regardless
application
context.
However,
results
this
show
all
conventional
classifiers
equivalently
or
better
than
in
The
affected
by
lack
diversity
initial
dataset.
If
augmented
dataset
significantly
alters
characteristics
original
dataset,
it
may
not
deliver
good
results.