
Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown
Published: Nov. 25, 2024
Language: Английский
Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown
Published: Nov. 25, 2024
Language: Английский
Published: Jan. 25, 2025
Abstract Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene and decision-making. However, previous research primarily focused on coarse-grained of videos, e.g., phase recognition, instrument triplet recognition that only considers relationships within triplets. In order provide a more comprehensive fine-grained this work focuses accurately identifying triplets < , verb target > from videos. Specifically, we propose vision-language framework incorporates intra- inter- modeling, termed I 2 TM, explore the among leverage model understanding entire process, thereby enhancing accuracy robustness recognition. Besides, also develop new semantic enhancer (TSE) establish relationships, both inter-triplets, across visual textual modalities. Extensive experimental results benchmark datasets demonstrate our approach can capture finer semantics, achieve effective analysis, with potential for widespread medical applications.
Language: Английский
Citations
0Artificial Intelligence Review, Journal Year: 2024, Volume and Issue: 57(11)
Published: Sept. 16, 2024
Language: Английский
Citations
2Lecture notes in computer science, Journal Year: 2023, Volume and Issue: unknown, P. 590 - 600
Published: Jan. 1, 2023
Language: Английский
Citations
5Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 328 - 338
Published: Jan. 1, 2024
Language: Английский
Citations
1IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 46181 - 46201
Published: Jan. 1, 2024
Solutions to vision tasks in gastrointestinal endoscopy (GIE) conventionally use image encoders pretrained a supervised manner with ImageNet-1k as backbones. However, the of modern self-supervised pretraining algorithms and recent dataset 100k unlabelled GIE images (Hyperkvasir-unlabelled) may allow for improvements. In this work, we study fine-tuned performance models ResNet50 ViT-B backbones manners Hyperkvasir-unlabelled (self-supervised only) range tasks. addition identifying most suitable pipeline backbone architecture each task, out those considered, our results suggest: that generally produces more than pretraining; is typically Hyperkvasir-unlabelled, notable exception monocular depth estimation colonoscopy; ViT-Bs are polyp segmentation colonoscopy, ResNet50s detection, both architectures perform similarly anatomical landmark recognition pathological finding characterisation. We hope work draws attention complexity tasks, informs development approaches convention, inspires further research on topic help advance development. Code available: github.com/ESandML/SSL4GIE.
Language: Английский
Citations
02022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Journal Year: 2024, Volume and Issue: unknown, P. 6600 - 6610
Published: Jan. 3, 2024
Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense annotations which are time and money consuming. Furthermore, the temporal boundaries between consecutive actions typically not well-defined, leading inherent ambiguity interrater disagreement. A promising approach remedy these limitations is timestamp supervision, requiring only one labeled frame per instance in a training video. In this work, we reformulate task graph problem with weakly-labeled vertices. We introduce an efficient method based on random walks graphs, obtained by solving sparse system linear equations. proposed technique can be employed any or combination following forms: (1) standalone solution for generating pseudo-labels from timestamps; (2) loss; (3) smoothing mechanism given intermediate predictions. Extensive experiments three datasets (50Salads, Breakfast, GTEA) show that our competes state-of-the-art, allows identification regions uncertainty around boundaries.
Language: Английский
Citations
0Research Square (Research Square), Journal Year: 2024, Volume and Issue: unknown
Published: Aug. 7, 2024
Language: Английский
Citations
0Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 251 - 262
Published: Jan. 1, 2024
Language: Английский
Citations
0Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 201 - 211
Published: Oct. 22, 2024
Language: Английский
Citations
0Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 43 - 53
Published: Oct. 24, 2024
Language: Английский
Citations
0