Surgical Video Workflow Analysis via Visual-Language Learning DOI Creative Commons
Pengpeng Li, Xiangbo Shu, Chun-Mei Feng

и другие.

Research Square (Research Square), Год журнала: 2024, Номер unknown

Опубликована: Ноя. 25, 2024

Abstract Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene and decision-making. However, previous research mainly focused on identifying coarse-grained phases from videos. In order provide a more comprehensive fine-grained of videos, this work focuses accurately triplets <instrument, verb, target> Specifically, we propose vision-language framework that incorporates intra- inter- triplet modeling, termed I2TM, explore the relationships among leverage model understanding entire process, thereby enhancing accuracy robustness recognition. Besides, also develop new semantic enhancer (TSE) establish relationships, both inter-triplets, across visual textual modalities. Extensive experimental results benchmark datasets demonstrate our approach can capture finer semantics, achieve effective analysis, with potential for widespread medical applications.

Язык: Английский

Surgical video workflow analysis via visual-language learning DOI Creative Commons
Pengpeng Li, Xiangbo Shu, Chun-Mei Feng

и другие.

Опубликована: Янв. 25, 2025

Abstract Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene and decision-making. However, previous research primarily focused on coarse-grained of videos, e.g., phase recognition, instrument triplet recognition that only considers relationships within triplets. In order provide a more comprehensive fine-grained this work focuses accurately identifying triplets < , verb target > from videos. Specifically, we propose vision-language framework incorporates intra- inter- modeling, termed I 2 TM, explore the among leverage model understanding entire process, thereby enhancing accuracy robustness recognition. Besides, also develop new semantic enhancer (TSE) establish relationships, both inter-triplets, across visual textual modalities. Extensive experimental results benchmark datasets demonstrate our approach can capture finer semantics, achieve effective analysis, with potential for widespread medical applications.

Язык: Английский

Процитировано

0

Deep learning for surgical workflow analysis: a survey of progresses, limitations, and trends DOI Creative Commons

Yunlong Li,

Zijian Zhao, Renbo Li

и другие.

Artificial Intelligence Review, Год журнала: 2024, Номер 57(11)

Опубликована: Сен. 16, 2024

Язык: Английский

Процитировано

2

Self-supervised Polyp Re-identification in Colonoscopy DOI

Yotam Intrator,

Natalie Aizenberg,

A. Livne

и другие.

Lecture notes in computer science, Год журнала: 2023, Номер unknown, С. 590 - 600

Опубликована: Янв. 1, 2023

Язык: Английский

Процитировано

5

Jumpstarting Surgical Computer Vision DOI
Deepak Alapatt, Aditya Murali, Vinkle Srivastav

и другие.

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 328 - 338

Опубликована: Янв. 1, 2024

Язык: Английский

Процитировано

1

Random Walks for Temporal Action Segmentation with Timestamp Supervision DOI

Roy Hirsch,

Regev Cohen,

Tomer Golany

и другие.

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Год журнала: 2024, Номер unknown, С. 6600 - 6610

Опубликована: Янв. 3, 2024

Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense annotations which are time and money consuming. Furthermore, the temporal boundaries between consecutive actions typically not well-defined, leading inherent ambiguity interrater disagreement. A promising approach remedy these limitations is timestamp supervision, requiring only one labeled frame per instance in a training video. In this work, we reformulate task graph problem with weakly-labeled vertices. We introduce an efficient method based on random walks graphs, obtained by solving sparse system linear equations. proposed technique can be employed any or combination following forms: (1) standalone solution for generating pseudo-labels from timestamps; (2) loss; (3) smoothing mechanism given intermediate predictions. Extensive experiments three datasets (50Salads, Breakfast, GTEA) show that our competes state-of-the-art, allows identification regions uncertainty around boundaries.

Язык: Английский

Процитировано

0

Predicting Generalization of AI Colonoscopy Models to Unseen Data DOI Creative Commons
Joel Shor, Carson McNeil,

Yotam Intrator

и другие.

Research Square (Research Square), Год журнала: 2024, Номер unknown

Опубликована: Авг. 7, 2024

Abstract Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques evaluating performance on unseen data require expensive and time-intensive labels. We show that a "Masked Siamese Network" (MSN), trained to predict masked out regions polyp images without labels, can the Computer Aided Detection (CADe) polyps colonoscopies, This holds true Japanese colonoscopies even when MSN only Israeli which differ scoping hardware, endoscope software, screening guidelines, bowel preparation, patient demographics, use such as narrow-band imaging (NBI) chromoendoscopy (CE). Since our technique uses neither colonoscopy-specific information nor it has potential apply more medical domains.

Язык: Английский

Процитировано

0

EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis DOI
Ruijie Yang, Yan Zhu,

Peiyao Fu

и другие.

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 251 - 262

Опубликована: Янв. 1, 2024

Язык: Английский

Процитировано

0

Arges: Spatio-Temporal Transformer for Ulcerative Colitis Severity Assessment in Endoscopy Videos DOI
Krishna Chaitanya, Pablo F. Damasceno, Shreyas Fadnavis

и другие.

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 201 - 211

Опубликована: Окт. 22, 2024

Язык: Английский

Процитировано

0

Predicting the generalization of computer aided detection (CADe) models for colonoscopy DOI Creative Commons
Joel Shor, Carson McNeil,

Yotam Intrator

и другие.

Discover Artificial Intelligence, Год журнала: 2024, Номер 4(1)

Опубликована: Ноя. 19, 2024

Abstract Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques evaluating performance on unseen data require expensive and time-intensive labels. We show that a "Masked Siamese Network" (MSN), trained to predict masked out regions polyp images without labels, can the Computer Aided Detection (CADe) polyps colonoscopies, This holds Japanese colonoscopies even when MSN only Israeli which differ scoping hardware, endoscope software, screening guidelines, bowel preparation, patient demographics, use such as narrow-band imaging (NBI) chromoendoscopy (CE). Since our technique uses neither colonoscopy-specific information nor it has potential apply more medical domains.

Язык: Английский

Процитировано

0

A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy DOI Creative Commons
Edward Sanderson, Bogdan J. Matuszewski

IEEE Access, Год журнала: 2024, Номер 12, С. 46181 - 46201

Опубликована: Янв. 1, 2024

Solutions to vision tasks in gastrointestinal endoscopy (GIE) conventionally use image encoders pretrained a supervised manner with ImageNet-1k as backbones. However, the of modern self-supervised pretraining algorithms and recent dataset 100k unlabelled GIE images (Hyperkvasir-unlabelled) may allow for improvements. In this work, we study fine-tuned performance models ResNet50 ViT-B backbones manners Hyperkvasir-unlabelled (self-supervised only) range tasks. addition identifying most suitable pipeline backbone architecture each task, out those considered, our results suggest: that generally produces more than pretraining; is typically Hyperkvasir-unlabelled, notable exception monocular depth estimation colonoscopy; ViT-Bs are polyp segmentation colonoscopy, ResNet50s detection, both architectures perform similarly anatomical landmark recognition pathological finding characterisation. We hope work draws attention complexity tasks, informs development approaches convention, inspires further research on topic help advance development. Code available: github.com/ESandML/SSL4GIE.

Язык: Английский

Процитировано

0