Pattern Analysis and Applications, Journal Year: 2025, Volume and Issue: 28(2)
Published: May 28, 2025
Language: Английский
Pattern Analysis and Applications, Journal Year: 2025, Volume and Issue: 28(2)
Published: May 28, 2025
Language: Английский
Sensors, Journal Year: 2024, Volume and Issue: 24(24), P. 8013 - 8013
Published: Dec. 15, 2024
Generating accurate and contextually rich captions for images videos is essential various applications, from assistive technology to content recommendation. However, challenges such as maintaining temporal coherence in videos, reducing noise large-scale datasets, enabling real-time captioning remain significant. We introduce MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework designed address these issues through three core innovations: cross-modal memory bank, adaptive dataset pruning, streaming decoder. The bank retrieves relevant context prior frames, enhancing consistency narrative flow. pruning mechanism filters noisy data, which improves alignment generalization. decoder allows by generating incrementally, without requiring access the full video sequence. Evaluated across standard datasets like MS COCO, YouCook2, ActivityNet, Flickr30k, achieves state-of-the-art results, with high scores on CIDEr, SPICE, Polos metrics, underscoring its human judgment effectiveness handling complex visual structures. This work demonstrates that offers robust, scalable solution both static dynamic tasks, advancing capabilities of vision–language models real-world applications.
Language: Английский
Citations
5Multimedia Systems, Journal Year: 2025, Volume and Issue: 31(2)
Published: March 24, 2025
Language: Английский
Citations
0Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)
Published: March 27, 2025
Early screening of lung nodules is mainly done manually by reading the patient's CT. This approach time-consuming laborious and prone to leakage misdiagnosis. Current methods for nodule detection face limitations such as high cost obtaining large-scale, high-quality annotated datasets poor robustness when dealing with data varying quality. The challenges include accurately detecting small irregular nodules, well ensuring model generalization across different sources. Therefore, this paper proposes a based on semi-supervised learning knowledge distillation (SSLKD-UNet). In paper, feature encoder hybrid architecture CNN Transformer designed fully extract features images, at same time, training strategy in which uses teacher instruct student learn more relevant regions CT images and, finally, applies rough annotation LUNA16 LC183 dataset help idea, completes accurate nodules. Combined complete process. Further experiments show that proposed can utilize amount inexpensive easy-to-obtain coarse-grained annotations pulmonary under guidance strategies, means inaccurate or incomplete information annotations, e.g., using coordinates instead pixel-level segmentation masks, realize early recognition results further corroborates model's efficacy, SSLKD-UNet demonstrating superior delineation even cases complex anatomical structures sizes.
Language: Английский
Citations
0Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)
Published: April 13, 2025
Efficient image augmentation for hyperspectral satellite images requires design of multiband processing models that can assist in improving classification performance different application scenarios. Existing either work on dynamic band fusions, or use deep learning techniques identification application-specific operations. Moreover, these static augmentations, and do not take into consideration image-specific parameters which limits their efficiency levels. To overcome limitations, this text proposes a novel multimodal object-detection based model sets. The proposed initially applies customized YOLO (You Only Look Once) object detection technique each the bands. This is followed by context-specific layer assists detected types. identified objects are analyzed via cascaded dual Generative Adversarial Network (cdGAN), estimates an object-level importance metric, used to evaluate its probability Based levels, Elephant Herding Optimization (EHO) band-selection used, high priority bands purposes. Augmentations controlled Firefly Optimizer (FFO) augmentations efficient images. augmented sets updated Incremental Learning (IL) continuous improvement accuracy levels Due optimizations, able improve 8.5%, precision 4.3%, recall 6.5%, while reducing delay 2.9% when compared with existing augmentation-based techniques.
Language: Английский
Citations
0Pattern Analysis and Applications, Journal Year: 2025, Volume and Issue: 28(2)
Published: April 29, 2025
Language: Английский
Citations
0Deleted Journal, Journal Year: 2025, Volume and Issue: 7(6)
Published: May 21, 2025
Language: Английский
Citations
0Sensors, Journal Year: 2025, Volume and Issue: 25(4), P. 1224 - 1224
Published: Feb. 17, 2025
Gaze estimation is increasingly pivotal in applications spanning virtual reality, augmented and driver monitoring systems, necessitating efficient yet accurate models for mobile deployment. Current methodologies often fall short, particularly settings, due to their extensive computational requirements or reliance on intricate pre-processing. Addressing these limitations, we present Mobile-GazeCapsNet, an innovative gaze framework that harnesses the strengths of capsule networks integrates them with lightweight architectures such as MobileNet v2, MobileOne, ResNet-18. This not only eliminates need facial landmark detection but also significantly enhances real-time operability devices. Through use Self-Attention Routing, GazeCapsNet dynamically allocates resources, thereby improving both accuracy efficiency. Our results demonstrate achieves competitive performance by optimizing through Routing (SAR), which replaces iterative routing a attention-based mechanism, show state-of-the-art (SOTA) several benchmark datasets, including ETH-XGaze Gaze360, achieving mean angular error (MAE) reduction up 15% compared existing models. Furthermore, model maintains processing capability 20 milliseconds per frame while requiring 11.7 million parameters, making it exceptionally suitable resource-constrained environments. These findings underscore efficacy practicality establish new standard technologies.
Language: Английский
Citations
0Bioengineering, Journal Year: 2025, Volume and Issue: 12(4), P. 342 - 342
Published: March 26, 2025
Accurate acne grading is critical in optimizing therapeutic decisions yet remains challenging due to lesion ambiguity and subjective clinical assessments. This study proposes the Feature Feedback-Based Pseudo-Label Learning (FF-PLL) framework address these limitations through three innovations: (1) an feature feedback (AFF) architecture with iterative pseudo-label refinement improve training robustness, enhance quality, increase diversity; (2) all-facial skin segmentation (AFSS) reduce background noise, enabling precise extraction; (3) AcneAugment (AA) strategy foster model generalization by introducing diverse representations. Experiments on ACNE04 ACNE-ECKH benchmark datasets demonstrate superiority of proposed framework, achieving accuracy 87.33% 67.50% ACNE-ECKH. Additionally, attains sensitivity 87.31%, specificity 90.14%, a Youden index (YI) 77.45% ACNE04. These advancements establish FF-PLL as clinically viable solution for standardized assessment, bridging gaps between computational dermatology practical healthcare needs.
Language: Английский
Citations
0Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)
Published: April 4, 2025
Medical images are obtained from different optical scanners and devices to provide in-body diagnosis detection. Such scanned/acquired tampered with/ distorted by the unnecessary noise present in pixel levels. A Trans-Pixelate Denoising Scheme (TPDS) is implemented denoise these pictures enhance diagnosis's precision. This scheme specific for CT with high between pixelated non-pixelated boundaries. Therefore, boundary detected an input image suggested a trans-pixel substitution using two-layer neural network. The first layer responsible verifying substitution-based accuracy, second identifying trans-pixels that improve accuracy. outcome of network used train noisy inputs under either conditions Proposed TPDS improves precision, detection 7.3%, 8.14%, 13.05% rates/boundaries. Under same variant, this reduces error time 11.15% 9.03%, respectively.
Language: Английский
Citations
0Multimedia Systems, Journal Year: 2025, Volume and Issue: 31(3)
Published: April 17, 2025
Language: Английский
Citations
0