Surgical Video Workflow Analysis via Visual-Language Learning DOI Creative Commons
Pengpeng Li, Xiangbo Shu, Chun-Mei Feng

et al.

Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 25, 2024

Abstract Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene and decision-making. However, previous research mainly focused on identifying coarse-grained phases from videos. In order provide a more comprehensive fine-grained of videos, this work focuses accurately triplets <instrument, verb, target> Specifically, we propose vision-language framework that incorporates intra- inter- triplet modeling, termed I2TM, explore the relationships among leverage model understanding entire process, thereby enhancing accuracy robustness recognition. Besides, also develop new semantic enhancer (TSE) establish relationships, both inter-triplets, across visual textual modalities. Extensive experimental results benchmark datasets demonstrate our approach can capture finer semantics, achieve effective analysis, with potential for widespread medical applications.

Language: Английский

Surgical video workflow analysis via visual-language learning DOI Creative Commons
Pengpeng Li, Xiangbo Shu, Chun-Mei Feng

et al.

Published: Jan. 25, 2025

Abstract Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene and decision-making. However, previous research primarily focused on coarse-grained of videos, e.g., phase recognition, instrument triplet recognition that only considers relationships within triplets. In order provide a more comprehensive fine-grained this work focuses accurately identifying triplets < , verb target > from videos. Specifically, we propose vision-language framework incorporates intra- inter- modeling, termed I 2 TM, explore the among leverage model understanding entire process, thereby enhancing accuracy robustness recognition. Besides, also develop new semantic enhancer (TSE) establish relationships, both inter-triplets, across visual textual modalities. Extensive experimental results benchmark datasets demonstrate our approach can capture finer semantics, achieve effective analysis, with potential for widespread medical applications.

Language: Английский

Citations

0

Deep learning for surgical workflow analysis: a survey of progresses, limitations, and trends DOI Creative Commons

Yunlong Li,

Zijian Zhao, Renbo Li

et al.

Artificial Intelligence Review, Journal Year: 2024, Volume and Issue: 57(11)

Published: Sept. 16, 2024

Language: Английский

Citations

2

Self-supervised Polyp Re-identification in Colonoscopy DOI

Yotam Intrator,

Natalie Aizenberg,

A. Livne

et al.

Lecture notes in computer science, Journal Year: 2023, Volume and Issue: unknown, P. 590 - 600

Published: Jan. 1, 2023

Language: Английский

Citations

5

Jumpstarting Surgical Computer Vision DOI
Deepak Alapatt, Aditya Murali, Vinkle Srivastav

et al.

Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 328 - 338

Published: Jan. 1, 2024

Language: Английский

Citations

1

A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy DOI Creative Commons
Edward Sanderson, Bogdan J. Matuszewski

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 46181 - 46201

Published: Jan. 1, 2024

Solutions to vision tasks in gastrointestinal endoscopy (GIE) conventionally use image encoders pretrained a supervised manner with ImageNet-1k as backbones. However, the of modern self-supervised pretraining algorithms and recent dataset 100k unlabelled GIE images (Hyperkvasir-unlabelled) may allow for improvements. In this work, we study fine-tuned performance models ResNet50 ViT-B backbones manners Hyperkvasir-unlabelled (self-supervised only) range tasks. addition identifying most suitable pipeline backbone architecture each task, out those considered, our results suggest: that generally produces more than pretraining; is typically Hyperkvasir-unlabelled, notable exception monocular depth estimation colonoscopy; ViT-Bs are polyp segmentation colonoscopy, ResNet50s detection, both architectures perform similarly anatomical landmark recognition pathological finding characterisation. We hope work draws attention complexity tasks, informs development approaches convention, inspires further research on topic help advance development. Code available: github.com/ESandML/SSL4GIE.

Language: Английский

Citations

0

Random Walks for Temporal Action Segmentation with Timestamp Supervision DOI

Roy Hirsch,

Regev Cohen,

Tomer Golany

et al.

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Journal Year: 2024, Volume and Issue: unknown, P. 6600 - 6610

Published: Jan. 3, 2024

Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense annotations which are time and money consuming. Furthermore, the temporal boundaries between consecutive actions typically not well-defined, leading inherent ambiguity interrater disagreement. A promising approach remedy these limitations is timestamp supervision, requiring only one labeled frame per instance in a training video. In this work, we reformulate task graph problem with weakly-labeled vertices. We introduce an efficient method based on random walks graphs, obtained by solving sparse system linear equations. proposed technique can be employed any or combination following forms: (1) standalone solution for generating pseudo-labels from timestamps; (2) loss; (3) smoothing mechanism given intermediate predictions. Extensive experiments three datasets (50Salads, Breakfast, GTEA) show that our competes state-of-the-art, allows identification regions uncertainty around boundaries.

Language: Английский

Citations

0

Predicting Generalization of AI Colonoscopy Models to Unseen Data DOI Creative Commons
Joel Shor, Carson McNeil,

Yotam Intrator

et al.

Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 7, 2024

Abstract Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques evaluating performance on unseen data require expensive and time-intensive labels. We show that a "Masked Siamese Network" (MSN), trained to predict masked out regions polyp images without labels, can the Computer Aided Detection (CADe) polyps colonoscopies, This holds true Japanese colonoscopies even when MSN only Israeli which differ scoping hardware, endoscope software, screening guidelines, bowel preparation, patient demographics, use such as narrow-band imaging (NBI) chromoendoscopy (CE). Since our technique uses neither colonoscopy-specific information nor it has potential apply more medical domains.

Language: Английский

Citations

0

EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis DOI
Ruijie Yang, Yan Zhu,

Peiyao Fu

et al.

Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 251 - 262

Published: Jan. 1, 2024

Language: Английский

Citations

0

Arges: Spatio-Temporal Transformer for Ulcerative Colitis Severity Assessment in Endoscopy Videos DOI
Krishna Chaitanya, Pablo F. Damasceno, Shreyas Fadnavis

et al.

Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 201 - 211

Published: Oct. 22, 2024

Language: Английский

Citations

0

Exploring the Effect of Dataset Diversity in Self-supervised Learning for Surgical Computer Vision DOI
Tim J. M. Jaspers,

Ronald L. P. D. de Jong,

Yasmina Al Khalil

et al.

Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 43 - 53

Published: Oct. 24, 2024

Language: Английский

Citations

0