
Sensors, Journal Year: 2025, Volume and Issue: 25(10), P. 2990 - 2990
Published: May 9, 2025
Current weakly supervised salient object detection (SOD) methods for RGB-D images mostly rely on image-level labels and sparse annotations, which makes it difficult to completely contour boundaries in complex scenes, especially when detecting objects with filamentary structures. To address the aforementioned issues, we propose a novel cross-modal SOD framework. The framework can adequately exploit advantages of weak generate high-quality pseudo-labels, fully couple multi-scale features RGB depth precise saliency prediction. mainly consists pseudo-label generation network (CPGN) an asymmetric salient-region prediction (ASPN). Among them, CPGN is proposed sufficiently leverage pixel-level guidance provided by point enhanced semantic supervision text are used supervise subsequent training ASPN. better capture contextual information geometric from images, ASPN, asymmetrically progressive network, gradually extract using Swin-Transformer CNN encoders, respectively. This significantly enhances model’s ability perceive detailed Additionally, edge constraint module (ECM) designed sharpen edges predicted regions. experimental results demonstrate that method shows performance depicting objects, structures, than other methods.
Language: Английский