
Research Square (Research Square), Год журнала: 2024, Номер unknown
Опубликована: Ноя. 25, 2024
Язык: Английский
Research Square (Research Square), Год журнала: 2024, Номер unknown
Опубликована: Ноя. 25, 2024
Язык: Английский
Опубликована: Янв. 25, 2025
Abstract Surgical video workflow analysis has made intensive development in computer-assisted surgery by combining deep learning models, aiming to enhance surgical scene and decision-making. However, previous research primarily focused on coarse-grained of videos, e.g., phase recognition, instrument triplet recognition that only considers relationships within triplets. In order provide a more comprehensive fine-grained this work focuses accurately identifying triplets < , verb target > from videos. Specifically, we propose vision-language framework incorporates intra- inter- modeling, termed I 2 TM, explore the among leverage model understanding entire process, thereby enhancing accuracy robustness recognition. Besides, also develop new semantic enhancer (TSE) establish relationships, both inter-triplets, across visual textual modalities. Extensive experimental results benchmark datasets demonstrate our approach can capture finer semantics, achieve effective analysis, with potential for widespread medical applications.
Язык: Английский
Процитировано
0Artificial Intelligence Review, Год журнала: 2024, Номер 57(11)
Опубликована: Сен. 16, 2024
Язык: Английский
Процитировано
2Lecture notes in computer science, Год журнала: 2023, Номер unknown, С. 590 - 600
Опубликована: Янв. 1, 2023
Язык: Английский
Процитировано
5Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 328 - 338
Опубликована: Янв. 1, 2024
Язык: Английский
Процитировано
12022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Год журнала: 2024, Номер unknown, С. 6600 - 6610
Опубликована: Янв. 3, 2024
Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense annotations which are time and money consuming. Furthermore, the temporal boundaries between consecutive actions typically not well-defined, leading inherent ambiguity interrater disagreement. A promising approach remedy these limitations is timestamp supervision, requiring only one labeled frame per instance in a training video. In this work, we reformulate task graph problem with weakly-labeled vertices. We introduce an efficient method based on random walks graphs, obtained by solving sparse system linear equations. proposed technique can be employed any or combination following forms: (1) standalone solution for generating pseudo-labels from timestamps; (2) loss; (3) smoothing mechanism given intermediate predictions. Extensive experiments three datasets (50Salads, Breakfast, GTEA) show that our competes state-of-the-art, allows identification regions uncertainty around boundaries.
Язык: Английский
Процитировано
0Research Square (Research Square), Год журнала: 2024, Номер unknown
Опубликована: Авг. 7, 2024
Язык: Английский
Процитировано
0Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 251 - 262
Опубликована: Янв. 1, 2024
Язык: Английский
Процитировано
0Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 201 - 211
Опубликована: Окт. 22, 2024
Язык: Английский
Процитировано
0Discover Artificial Intelligence, Год журнала: 2024, Номер 4(1)
Опубликована: Ноя. 19, 2024
Abstract Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques evaluating performance on unseen data require expensive and time-intensive labels. We show that a "Masked Siamese Network" (MSN), trained to predict masked out regions polyp images without labels, can the Computer Aided Detection (CADe) polyps colonoscopies, This holds Japanese colonoscopies even when MSN only Israeli which differ scoping hardware, endoscope software, screening guidelines, bowel preparation, patient demographics, use such as narrow-band imaging (NBI) chromoendoscopy (CE). Since our technique uses neither colonoscopy-specific information nor it has potential apply more medical domains.
Язык: Английский
Процитировано
0IEEE Access, Год журнала: 2024, Номер 12, С. 46181 - 46201
Опубликована: Янв. 1, 2024
Solutions to vision tasks in gastrointestinal endoscopy (GIE) conventionally use image encoders pretrained a supervised manner with ImageNet-1k as backbones. However, the of modern self-supervised pretraining algorithms and recent dataset 100k unlabelled GIE images (Hyperkvasir-unlabelled) may allow for improvements. In this work, we study fine-tuned performance models ResNet50 ViT-B backbones manners Hyperkvasir-unlabelled (self-supervised only) range tasks. addition identifying most suitable pipeline backbone architecture each task, out those considered, our results suggest: that generally produces more than pretraining; is typically Hyperkvasir-unlabelled, notable exception monocular depth estimation colonoscopy; ViT-Bs are polyp segmentation colonoscopy, ResNet50s detection, both architectures perform similarly anatomical landmark recognition pathological finding characterisation. We hope work draws attention complexity tasks, informs development approaches convention, inspires further research on topic help advance development. Code available: github.com/ESandML/SSL4GIE.
Язык: Английский
Процитировано
0