EchoFM: A Pre-training and Fine-tuning Framework for Echocardiogram Videos Vision Foundation Model DOI Creative Commons
Ziyang Zhang,

Qinxin Wu,

Sirui Ding

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 10, 2024

Abstract Background Echocardiograms provide vital insights into cardiac health, but their complex, multi-dimensional data presents challenges for analysis and interpretation. Current deep learning models echocardiogram often rely on supervised training, limiting generalizability robustness across datasets clinical environments. Objective To develop evaluate EchoVisionFM ( E chocardiogram video Vision F oundation M odel), a self-supervised framework designed to pre-train encoder large-scale, unlabeled data. aims produce robust transferrable spatiotemporal representations, improving downstream performance diverse conditions. Methods Our employs Echo-VideoMAE, an autoencoder-based transformer that compresses reconstructs by masking non-overlapping patches leveraging ViT encoder-decoder structure. For enhanced representation, we introduce STFF-Net , S patio T emporal eature usion Net work, integrate spatial temporal features from the manifold representations. We pre-trained using MIMIC-IV-ECHO dataset fine-tuned it EchoNet-Dynamic tasks, including classification regression of key parameters. Results demonstrated superior in classifying left ventricular ejection fraction (LVEF), achieving accuracy 89.12%, F1 score 0.9323, AUC 0.9364. In outperformed state-of-the-art models, with LVEF prediction reaching mean absolute error (MAE) 4.18% R 2 0.8022. The model also showed significant improvements estimating end-systolic end-diastolic volumes, values 0.8006 0.7296, respectively. Incorporating led further gains tasks. Conclusion results indicate large-scale pre-training videos enables extraction transferable clinically relevant features, outperforming traditional CNN-based methods. framework, particularly STFF-Net, enhances predictive various offers powerful, scalable approach analysis, potential applications diagnostics research.

Language: Английский

Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention DOI
Wei-Lun Tsai,

Phu Dung Le,

W.P. Ho

et al.

Automation in Construction, Journal Year: 2024, Volume and Issue: 169, P. 105863 - 105863

Published: Nov. 22, 2024

Language: Английский

Citations

3

Classification and Application of Deep Learning in Construction Engineering and Management – A Systematic Literature Review and Future Innovations DOI Creative Commons

Qingze Li,

Yang Yang, Gang Yao

et al.

Case Studies in Construction Materials, Journal Year: 2024, Volume and Issue: unknown, P. e04051 - e04051

Published: Nov. 1, 2024

Language: Английский

Citations

3

Transformer-based deep learning model and video dataset for installation action recognition in offsite projects DOI
Jun Young Jang,

Eunbeen Jeong,

Tae Wan Kim

et al.

Automation in Construction, Journal Year: 2025, Volume and Issue: 172, P. 106042 - 106042

Published: Feb. 6, 2025

Language: Английский

Citations

0

CLUMM: Contrastive Learning for Unobtrusive Motion Monitoring DOI Creative Commons
Pius Gyamenah, Hari Iyer, Heejin Jeong

et al.

Sensors, Journal Year: 2025, Volume and Issue: 25(4), P. 1048 - 1048

Published: Feb. 10, 2025

Traditional approaches for human monitoring and motion recognition often rely on wearable sensors, which, while effective, are obtrusive cause significant discomfort to workers. More recent have employed unobtrusive, real-time sensing using cameras mounted in the manufacturing environment. While these methods generate large volumes of rich data, they require extensive labeling analysis machine learning applications. Additionally, frequently capture irrelevant environmental information, which can hinder performance deep algorithms. To address limitations, this paper introduces a novel framework that leverages contrastive approach learn representations from raw images without need manual labeling. This mitigates effect complexity by focusing critical joint coordinates relevant tasks. ensures model learns directly human-specific effectively reducing impact surrounding A custom dataset subjects simulating various tasks workplace setting is used training evaluation. By fine-tuning learned downstream classification task, we achieve up 90% accuracy, demonstrating effectiveness our proposed solution monitoring.

Language: Английский

Citations

0

Improving single‐stage activity recognition of excavators using knowledge distillation of temporal gradient data DOI Creative Commons
Ali Ghelmani, Amin Hammad

Computer-Aided Civil and Infrastructure Engineering, Journal Year: 2024, Volume and Issue: 39(13), P. 2028 - 2053

Published: Jan. 29, 2024

Abstract Single‐stage activity recognition methods have been gaining popularity within the construction domain. However, their low per‐frame accuracy necessitates additional post‐processing to link detections. Therefore, limiting real‐time monitoring capabilities is an indispensable component of emerging digital twins. This study proposes knowledge DIstillation temporal Gradient data for Entity Recognition (DIGER), built upon you only watch once (YOWO) method and improving its localization performance. Activity improved by designing auxiliary backbone exploit complementary information in gradient (transferred into YOWO using distillation), while primarily through integration complete intersection over union loss. DIGER achieved a 93.6% mean average precision at 50% 79.8% on large custom dataset, outperforming state‐of‐the‐art without requiring computation during inference, making it highly effective site activities.

Language: Английский

Citations

3

From raw to refined: Data preprocessing for construction machine learning (ML), deep learning (DL), and reinforcement learning (RL) models DOI
SeyedeZahra Golazad, Abbas Mohammadi, Abbas Rashidi

et al.

Automation in Construction, Journal Year: 2024, Volume and Issue: 168, P. 105844 - 105844

Published: Oct. 24, 2024

Language: Английский

Citations

3

Self-supervised pre-training model based on multi-view for MOOC recommendation DOI
Runyu Tian, Juanjuan Cai, Chuanzhen Li

et al.

Expert Systems with Applications, Journal Year: 2024, Volume and Issue: 252, P. 124143 - 124143

Published: May 14, 2024

Language: Английский

Citations

2

Construction Safety Inspection Workflow with Clip-Based Image Captioning and Attention Generation DOI
Wei-Lun Tsai, Jacob J. Lin,

W.P. Ho

et al.

Published: Jan. 1, 2024

Safety inspections on construction sites are critical for accident prevention. Traditionally, inspectors document violations using photos and textual descriptions, but this process is time-consuming inconsistent. Studies sought to enhance efficiency with standardized forms image captioning techniques. However, streamlining the compiling reports effectively remains challenging. We propose an image-language model that automatically generates safety observations through CLIP fine-tuning prefix tailored safety. In addition, attention map of predicted captions will be generated obtain reasoning between violation in images text. The can successfully classify nine types at average accuracy 73.7% outperform baseline caption by 41.8%. proposed framework integrated a mobile phone application inspection real-world scenarios, which supports documenting generating effectively.

Language: Английский

Citations

1

Real-Time Detection of Personal Protective Equipment Violations for Construction Workers Using Semisupervised Learning and Video Clips DOI
Q. H. Chen,

Danbing Long,

Hongjie Wang

et al.

Journal of Construction Engineering and Management, Journal Year: 2024, Volume and Issue: 151(3)

Published: Dec. 27, 2024

Language: Английский

Citations

1

Relevance of deep sequence models for recognising automated construction activities: a case study on a low-rise construction system DOI Creative Commons
Aparna Harichandran, Benny Raphael, Abhijit Mukherjee

et al.

Journal of Information Technology in Construction, Journal Year: 2023, Volume and Issue: 28, P. 458 - 481

Published: Aug. 25, 2023

Recognising activities of construction equipment is essential for monitoring productivity, progress, safety, and environmental impacts. While there have been many studies on activity recognition earth excavation moving equipment, identification Automated Construction Systems (ACS) has rarely attempted. Especially low-rise ACS that offers energy-efficient, cost-effective solutions urgent housing needs, provides more affordable living options a broader population. Deep learning methods gained lot attention because their ability to perform classification without manually extracting relevant features. This study evaluates the feasibility deep sequence models developing an framework automated equipment. Time series acceleration data was collected from structure identify major operation classes ACS. Long Short Term Memory Networks (LSTM) were applied identifying performance compared with traditional machine classifiers. Diverse augmentation adopted generating datasets training Several recently published literature seem establish superiority complex techniques over algorithms regardless application context. However, results this show all conventional classifiers equivalently or better than in The affected by lack diversity initial dataset. If augmented dataset significantly alters characteristics original dataset, it may not deliver good results.

Language: Английский

Citations

2