Cited by A related convolutional neural network for cancer diagnosis using microRNA data classification

ViTs as backbones: Leveraging vision transformers for feature extraction DOI

Omar Elharrouss, Yassine Himeur, Yasir Mahmood

и другие.

Information Fusion, Год журнала: 2025, Номер unknown, С. 102951 - 102951

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

QAGA-Net: enhanced vision transformer-based object detection for remote sensing images DOI

Huaxiang Song, Haidong Xia, Wenhui Wang

и другие.

International Journal of Intelligent Computing and Cybernetics, Год журнала: 2024, Номер 18(1), С. 133 - 152

Опубликована: Ноя. 13, 2024

Purpose Vision transformers (ViT) detectors excel in processing natural images. However, when remote sensing images (RSIs), ViT methods generally exhibit inferior accuracy compared to approaches based on convolutional neural networks (CNNs). Recently, researchers have proposed various structural optimization strategies enhance the performance of detectors, but progress has been insignificant. We contend that frequent scarcity RSI samples is primary cause this problem, and model modifications alone cannot solve it. Design/methodology/approach To address this, we introduce a faster RCNN-based approach, termed QAGA-Net, which significantly enhances recognition. Initially, propose novel quantitative augmentation learning (QAL) strategy sparse data distribution RSIs. This integrated as QAL module, plug-and-play component active exclusively during model’s training phase. Subsequently, enhanced feature pyramid network (FPN) by introducing two efficient modules: global attention (GA) module long-range dependencies multi-scale information fusion, an pooling (EP) optimize capability understand both high low frequency information. Importantly, QAGA-Net compact size achieves balance between computational efficiency accuracy. Findings verified using different models detector’s backbone. Extensive experiments NWPU-10 DIOR20 datasets demonstrate superior 23 other or CNN literature. Specifically, shows increase mAP 2.1% 2.6% challenging dataset top-ranked respectively. Originality/value paper highlights impact detection performance. fundamentally data-driven approach: module. Additionally, introduced modules FPN. More importantly, our potential collaborate with method does not require any

Язык: Английский

Процитировано

Applications of knowledge distillation in remote sensing: A survey DOI

Yassine Himeur, Nour Aburaed, Omar Elharrouss

и другие.

Information Fusion, Год журнала: 2024, Номер unknown, С. 102742 - 102742

Опубликована: Окт. 1, 2024

Язык: Английский

Процитировано

Variance Consistency Learning: Enhancing Cross-Modal Knowledge Distillation for Remote Sensing Image Classification DOI

Huaxiang Song, Yong Zhou,

Wanbo Liu

и другие.

Annals of Emerging Technologies in Computing, Год журнала: 2024, Номер 8(4), С. 56 - 76

Опубликована: Окт. 1, 2024

Vision Transformers (ViTs) have demonstrated exceptional accuracy in classifying remote sensing images (RSIs). However, existing knowledge distillation (KD) methods for transferring representations from a large ViT to more compact Convolutional Neural Network (CNN) proven ineffective. This limitation significantly hampers the remarkable generalization capability of ViTs during deployment due their substantial size. Contrary common beliefs, we argue that domain discrepancies along with RSI inherent natures constrain effectiveness and efficiency cross-modal transfer. Consequently, propose novel Variance Consistency Learning (VCL) strategy enhance KD process, implemented through plug-and-plug module within ViTteachingCNN pipeline. We evaluated our student model, termed VCL-Net, on three datasets. The results reveal VCL-Net exhibits superior size compared 33 other state-of-the-art published past years. Specifically, surpasses KD-based maximum improvement 22% across different Furthermore, visualization analysis model activations reveals has learned long-range dependencies features teacher. Moreover, ablation experiments suggest method reduced time costs process by at least 75%. Therefore, study offers effective efficient approach transfer when addressing discrepancies.

Язык: Английский

Процитировано

Optimized Data Distribution Learning for Enhancing Vision Transformer‐Based Object Detection in Remote Sensing Images DOI

Huaxiang Song, Junping Xie, Yunyang Wang

и другие.

The Photogrammetric Record, Год журнала: 2025, Номер 40(189)

Опубликована: Янв. 1, 2025

ABSTRACT Existing Vision Transformer (ViT)‐based object detection methods for remote sensing images (RSIs) face significant challenges due to the scarcity of RSI samples and over‐reliance on enhancement strategies originally developed natural images. This often leads inconsistent data distributions between training testing subsets, resulting in degraded model performance. In this study, we introduce an optimized distribution learning (ODDL) strategy develop framework based Faster R‐CNN architecture, named ODDL‐Net. The ODDL begins with augmentation (OA) technique, overcoming limitations conventional methods. Next, propose mosaic algorithm (OMA), improving upon shortcomings traditional Mosaic techniques. Additionally, a feature fusion regularization (FFR) method, addressing inherent classic pyramid networks. These innovations are integrated into three modular, plug‐and‐play components—namely, OA, OMA, FFR modules—ensuring that can be seamlessly incorporated existing frameworks without requiring modifications. To evaluate effectiveness proposed ODDL‐Net, two variants different ViT architectures: Next (NViT) small Swin (SwinT) tiny model, both used as backbones. Experimental results NWPU10, DIOR20, MAR20, GLH‐Bridge datasets demonstrate ODDL‐Net achieve impressive accuracy, surpassing 23 state‐of‐the‐art introduced since 2023. Specifically, ODDL‐Net‐NViT attained accuracies 78.3% challenging DIOR20 dataset 61.4% dataset. Notably, represents substantial improvement approximately 23% over R‐CNN‐ResNet50 baseline conclusion, study demonstrates ViTs well suited high‐accuracy RSIs. Furthermore, it provides straightforward solution building ViT‐based detectors, offering practical approach requires little modification.

Язык: Английский

Процитировано

A multi-scale small object detection algorithm SMA-YOLO for UAV remote sensing images DOI

Shilong Zhou,

Haijin Zhou,

Lei Qian

и другие.

Scientific Reports, Год журнала: 2025, Номер 15(1)

Опубликована: Март 18, 2025

Detecting small objects in complex remote sensing environments presents significant challenges, including insufficient extraction of local spatial information, rigid feature fusion, and limited global representation. In addition, improving model performance requires a delicate balance between accuracy managing computational complexity. To address these we propose the SMA-YOLO algorithm. First, introduce Non-Semantic Sparse Attention (NSSA) mechanism backbone network, which efficiently extracts non-semantic features related to task, thus model's sensitivity objects. throat, design Bidirectional Multi-Branch Auxiliary Feature Pyramid Network (BIMA-FPN), integrates high-level semantic information with low-level details, object detection while expanding multi-scale receptive fields. Finally, incorporate Channel-Space Fusion Adaptive Head (CSFA-Head), fully handles adaptively consistency problems different scales, further robustness scenarios. Experimental results on VisDrone2019 dataset show that achieves 13% improvement mAP compared baseline model, demonstrating exceptional adaptability tasks for imagery. These provide valuable insights new approaches advance research this area.

Язык: Английский

Процитировано

Hybrid Attention Spike Transformer DOI

Xiongfei Fan, Yujiao Zhang, Yu Zhang

и другие.

IET Cyber-Systems and Robotics, Год журнала: 2025, Номер 7(1)

Опубликована: Янв. 1, 2025

ABSTRACT Spike transformers cannot be pretrained due to objective factors such as lack of datasets and memory constraints, which results in a significant performance gap compared artificial neural networks (ANNs), thereby hindering their practical applicability. To address this issue, we propose hybrid attention spike transformer that utilises self‐attention with compound tokens channel attention‐based token processing better capture the inductive biases data. We also add convolution patch splitting feedforward networks, not only provides local information but leverages translation invariance locality convolutions help model converge. Experiments on static neuromorphic demonstrate our method achieves state‐of‐the‐art spiking (SNNs) field. Notably, achieve top‐1 accuracy 80.59% CIFAR‐100 4 time steps. As far know, it is first exploration multiattention fusion, achieving outstanding effectiveness.

Язык: Английский

Процитировано

Class-adaptive attention transfer and multilevel entropy decoupled knowledge distillation DOI

X. L. Lu, Zhanquan Sun, Chuntao Zou

и другие.

Multimedia Systems, Год журнала: 2025, Номер 31(3)

Опубликована: Апрель 15, 2025

Язык: Английский

Процитировано

Ensemble learning and EigenCAM-based feature analysis for improving the performance and explainability of object detection in drone imagery DOI

Gargi Joshi,

Amey Joshi,

Manish Shetty

и другие.

Deleted Journal, Год журнала: 2025, Номер 7(5)

Опубликована: Апрель 20, 2025

Язык: Английский

Процитировано

Adaptive adam-based optimizers using second-order weight decoupling and gradient-aware weight decay for vision transformer DOI

B Sai,

Snehasis Mukherjee, Shiv Ram Dubey

и другие.

Machine Vision and Applications, Год журнала: 2025, Номер 36(3)

Опубликована: Апрель 22, 2025

Язык: Английский

Процитировано