Cited by FORT-RAJ: a fisheye-optimized deep learning model for real-time trajectory prediction

MIRA-CAP: Memory-Integrated Retrieval-Augmented Captioning for State-of-the-Art Image and Video Captioning DOI

Sabina Umirzakova, Shakhnoza Muksimova,

Sevara Mardieva

et al.

Sensors, Journal Year: 2024, Volume and Issue: 24(24), P. 8013 - 8013

Published: Dec. 15, 2024

Generating accurate and contextually rich captions for images videos is essential various applications, from assistive technology to content recommendation. However, challenges such as maintaining temporal coherence in videos, reducing noise large-scale datasets, enabling real-time captioning remain significant. We introduce MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework designed address these issues through three core innovations: cross-modal memory bank, adaptive dataset pruning, streaming decoder. The bank retrieves relevant context prior frames, enhancing consistency narrative flow. pruning mechanism filters noisy data, which improves alignment generalization. decoder allows by generating incrementally, without requiring access the full video sequence. Evaluated across standard datasets like MS COCO, YouCook2, ActivityNet, Flickr30k, achieves state-of-the-art results, with high scores on CIDEr, SPICE, Polos metrics, underscoring its human judgment effectiveness handling complex visual structures. This work demonstrates that offers robust, scalable solution both static dynamic tasks, advancing capabilities of vision–language models real-world applications.

Language: Английский

Citations

Do-DETR: enhancing DETR training convergence with integrated denoising and RoI mechanism DOI

Hong Liang,

Yu Li,

Qian Zhang

et al.

Multimedia Systems, Journal Year: 2025, Volume and Issue: 31(2)

Published: March 24, 2025

Language: Английский

Citations

A semisupervised knowledge distillation model for lung nodule segmentation DOI

Wenjuan Liu, Limin Zhang, Xiangrui Li

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: March 27, 2025

Early screening of lung nodules is mainly done manually by reading the patient's CT. This approach time-consuming laborious and prone to leakage misdiagnosis. Current methods for nodule detection face limitations such as high cost obtaining large-scale, high-quality annotated datasets poor robustness when dealing with data varying quality. The challenges include accurately detecting small irregular nodules, well ensuring model generalization across different sources. Therefore, this paper proposes a based on semi-supervised learning knowledge distillation (SSLKD-UNet). In paper, feature encoder hybrid architecture CNN Transformer designed fully extract features images, at same time, training strategy in which uses teacher instruct student learn more relevant regions CT images and, finally, applies rough annotation LUNA16 LC183 dataset help idea, completes accurate nodules. Combined complete process. Further experiments show that proposed can utilize amount inexpensive easy-to-obtain coarse-grained annotations pulmonary under guidance strategies, means inaccurate or incomplete information annotations, e.g., using coordinates instead pixel-level segmentation masks, realize early recognition results further corroborates model's efficacy, SSLKD-UNet demonstrating superior delineation even cases complex anatomical structures sizes.

Language: Английский

Citations

MODAMS: design of a multimodal object-detection based augmentation model for satellite image sets DOI

Rahul Malik, Rachit Garg, Korhan Cengiz

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: April 13, 2025

Efficient image augmentation for hyperspectral satellite images requires design of multiband processing models that can assist in improving classification performance different application scenarios. Existing either work on dynamic band fusions, or use deep learning techniques identification application-specific operations. Moreover, these static augmentations, and do not take into consideration image-specific parameters which limits their efficiency levels. To overcome limitations, this text proposes a novel multimodal object-detection based model sets. The proposed initially applies customized YOLO (You Only Look Once) object detection technique each the bands. This is followed by context-specific layer assists detected types. identified objects are analyzed via cascaded dual Generative Adversarial Network (cdGAN), estimates an object-level importance metric, used to evaluate its probability Based levels, Elephant Herding Optimization (EHO) band-selection used, high priority bands purposes. Augmentations controlled Firefly Optimizer (FFO) augmentations efficient images. augmented sets updated Incremental Learning (IL) continuous improvement accuracy levels Due optimizations, able improve 8.5%, precision 4.3%, recall 6.5%, while reducing delay 2.9% when compared with existing augmentation-based techniques.

Language: Английский

Citations

Enhancing out-of-distribution learning in computer vision through dominant feature masking DOI

Artem Pilzak, Jean‐Philippe Thivierge

Pattern Analysis and Applications, Journal Year: 2025, Volume and Issue: 28(2)

Published: April 29, 2025

Language: Английский

Citations

Echo-DND: a dual noise diffusion model for robust and precise left ventricle segmentation in echocardiography DOI

Md. Abdur Rahman,

Keerthiveena Balraj,

Manojkumar Ramteke

et al.

Deleted Journal, Journal Year: 2025, Volume and Issue: 7(6)

Published: May 21, 2025

Language: Английский

Citations

GazeCapsNet: A Lightweight Gaze Estimation Framework DOI

Shakhnoza Muksimova, Yakhyokhuja Valikhujaev, Sabina Umirzakova

et al.

Sensors, Journal Year: 2025, Volume and Issue: 25(4), P. 1224 - 1224

Published: Feb. 17, 2025

Gaze estimation is increasingly pivotal in applications spanning virtual reality, augmented and driver monitoring systems, necessitating efficient yet accurate models for mobile deployment. Current methodologies often fall short, particularly settings, due to their extensive computational requirements or reliance on intricate pre-processing. Addressing these limitations, we present Mobile-GazeCapsNet, an innovative gaze framework that harnesses the strengths of capsule networks integrates them with lightweight architectures such as MobileNet v2, MobileOne, ResNet-18. This not only eliminates need facial landmark detection but also significantly enhances real-time operability devices. Through use Self-Attention Routing, GazeCapsNet dynamically allocates resources, thereby improving both accuracy efficiency. Our results demonstrate achieves competitive performance by optimizing through Routing (SAR), which replaces iterative routing a attention-based mechanism, show state-of-the-art (SOTA) several benchmark datasets, including ETH-XGaze Gaze360, achieving mean angular error (MAE) reduction up 15% compared existing models. Furthermore, model maintains processing capability 20 milliseconds per frame while requiring 11.7 million parameters, making it exceptionally suitable resource-constrained environments. These findings underscore efficacy practicality establish new standard technologies.

Language: Английский

Citations

Feature Feedback-Based Pseudo-Label Learning for Multi-Standards in Clinical Acne Grading DOI

Yung-Yao Chen,

Hung-Tse Chan,

Hsiao-Chi Wang

et al.

Bioengineering, Journal Year: 2025, Volume and Issue: 12(4), P. 342 - 342

Published: March 26, 2025

Accurate acne grading is critical in optimizing therapeutic decisions yet remains challenging due to lesion ambiguity and subjective clinical assessments. This study proposes the Feature Feedback-Based Pseudo-Label Learning (FF-PLL) framework address these limitations through three innovations: (1) an feature feedback (AFF) architecture with iterative pseudo-label refinement improve training robustness, enhance quality, increase diversity; (2) all-facial skin segmentation (AFSS) reduce background noise, enabling precise extraction; (3) AcneAugment (AA) strategy foster model generalization by introducing diverse representations. Experiments on ACNE04 ACNE-ECKH benchmark datasets demonstrate superiority of proposed framework, achieving accuracy 87.33% 67.50% ACNE-ECKH. Additionally, attains sensitivity 87.31%, specificity 90.14%, a Youden index (YI) 77.45% ACNE04. These advancements establish FF-PLL as clinically viable solution for standardized assessment, bridging gaps between computational dermatology practical healthcare needs.

Language: Английский

Citations

Trans pixelate substitution scheme for denoising computed tomography images towards high diagnosis accuracy DOI

Fengjun Hu, Hanjie Gu, Fan Wu

et al.

Scientific Reports, Journal Year: 2025, Volume and Issue: 15(1)

Published: April 4, 2025

Medical images are obtained from different optical scanners and devices to provide in-body diagnosis detection. Such scanned/acquired tampered with/ distorted by the unnecessary noise present in pixel levels. A Trans-Pixelate Denoising Scheme (TPDS) is implemented denoise these pictures enhance diagnosis's precision. This scheme specific for CT with high between pixelated non-pixelated boundaries. Therefore, boundary detected an input image suggested a trans-pixel substitution using two-layer neural network. The first layer responsible verifying substitution-based accuracy, second identifying trans-pixels that improve accuracy. outcome of network used train noisy inputs under either conditions Proposed TPDS improves precision, detection 7.3%, 8.14%, 13.05% rates/boundaries. Under same variant, this reduces error time 11.15% 9.03%, respectively.

Language: Английский

Citations

Mixed multi-scale residual attention networks for single image super-resolution reconstruction DOI

Liyun Zhang,

Ming Zhang, Fei Fan

et al.

Multimedia Systems, Journal Year: 2025, Volume and Issue: 31(3)

Published: April 17, 2025

Language: Английский

Citations