Surgical Video Workflow Analysis via Visual-Language Learning DOI Creative Commons
Pengpeng Li, Xiangbo Shu, Chun-Mei Feng

et al.

Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 25, 2024

Abstract Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene and decision-making. However, previous research mainly focused on identifying coarse-grained phases from videos. In order provide a more comprehensive fine-grained of videos, this work focuses accurately triplets <instrument, verb, target> Specifically, we propose vision-language framework that incorporates intra- inter- triplet modeling, termed I2TM, explore the relationships among leverage model understanding entire process, thereby enhancing accuracy robustness recognition. Besides, also develop new semantic enhancer (TSE) establish relationships, both inter-triplets, across visual textual modalities. Extensive experimental results benchmark datasets demonstrate our approach can capture finer semantics, achieve effective analysis, with potential for widespread medical applications.

Language: Английский

Predicting the generalization of computer aided detection (CADe) models for colonoscopy DOI Creative Commons
Joel Shor, Carson McNeil,

Yotam Intrator

et al.

Discover Artificial Intelligence, Journal Year: 2024, Volume and Issue: 4(1)

Published: Nov. 19, 2024

Abstract Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques evaluating performance on unseen data require expensive and time-intensive labels. We show that a "Masked Siamese Network" (MSN), trained to predict masked out regions polyp images without labels, can the Computer Aided Detection (CADe) polyps colonoscopies, This holds Japanese colonoscopies even when MSN only Israeli which differ scoping hardware, endoscope software, screening guidelines, bowel preparation, patient demographics, use such as narrow-band imaging (NBI) chromoendoscopy (CE). Since our technique uses neither colonoscopy-specific information nor it has potential apply more medical domains.

Language: Английский

Citations

0

Surgical Video Workflow Analysis via Visual-Language Learning DOI Creative Commons
Pengpeng Li, Xiangbo Shu, Chun-Mei Feng

et al.

Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 25, 2024

Abstract Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene and decision-making. However, previous research mainly focused on identifying coarse-grained phases from videos. In order provide a more comprehensive fine-grained of videos, this work focuses accurately triplets <instrument, verb, target> Specifically, we propose vision-language framework that incorporates intra- inter- triplet modeling, termed I2TM, explore the relationships among leverage model understanding entire process, thereby enhancing accuracy robustness recognition. Besides, also develop new semantic enhancer (TSE) establish relationships, both inter-triplets, across visual textual modalities. Extensive experimental results benchmark datasets demonstrate our approach can capture finer semantics, achieve effective analysis, with potential for widespread medical applications.

Language: Английский

Citations

0