Cited by A related convolutional neural network for cancer diagnosis using microRNA data classification

ViTs as backbones: Leveraging vision transformers for feature extraction DOI

Omar Elharrouss, Yassine Himeur, Yasir Mahmood

и другие.

Information Fusion, Год журнала: 2025, Номер unknown, С. 102951 - 102951

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

QAGA-Net: enhanced vision transformer-based object detection for remote sensing images DOI

Huaxiang Song, Haidong Xia, Wenhui Wang

и другие.

International Journal of Intelligent Computing and Cybernetics, Год журнала: 2024, Номер 18(1), С. 133 - 152

Опубликована: Ноя. 13, 2024

Purpose Vision transformers (ViT) detectors excel in processing natural images. However, when remote sensing images (RSIs), ViT methods generally exhibit inferior accuracy compared to approaches based on convolutional neural networks (CNNs). Recently, researchers have proposed various structural optimization strategies enhance the performance of detectors, but progress has been insignificant. We contend that frequent scarcity RSI samples is primary cause this problem, and model modifications alone cannot solve it. Design/methodology/approach To address this, we introduce a faster RCNN-based approach, termed QAGA-Net, which significantly enhances recognition. Initially, propose novel quantitative augmentation learning (QAL) strategy sparse data distribution RSIs. This integrated as QAL module, plug-and-play component active exclusively during model’s training phase. Subsequently, enhanced feature pyramid network (FPN) by introducing two efficient modules: global attention (GA) module long-range dependencies multi-scale information fusion, an pooling (EP) optimize capability understand both high low frequency information. Importantly, QAGA-Net compact size achieves balance between computational efficiency accuracy. Findings verified using different models detector’s backbone. Extensive experiments NWPU-10 DIOR20 datasets demonstrate superior 23 other or CNN literature. Specifically, shows increase mAP 2.1% 2.6% challenging dataset top-ranked respectively. Originality/value paper highlights impact detection performance. fundamentally data-driven approach: module. Additionally, introduced modules FPN. More importantly, our potential collaborate with method does not require any

Язык: Английский

Процитировано

Optimized Data Distribution Learning for Enhancing Vision Transformer‐Based Object Detection in Remote Sensing Images DOI

Huaxiang Song, Junping Xie, Yunyang Wang

и другие.

The Photogrammetric Record, Год журнала: 2025, Номер 40(189)

Опубликована: Янв. 1, 2025

ABSTRACT Existing Vision Transformer (ViT)‐based object detection methods for remote sensing images (RSIs) face significant challenges due to the scarcity of RSI samples and over‐reliance on enhancement strategies originally developed natural images. This often leads inconsistent data distributions between training testing subsets, resulting in degraded model performance. In this study, we introduce an optimized distribution learning (ODDL) strategy develop framework based Faster R‐CNN architecture, named ODDL‐Net. The ODDL begins with augmentation (OA) technique, overcoming limitations conventional methods. Next, propose mosaic algorithm (OMA), improving upon shortcomings traditional Mosaic techniques. Additionally, a feature fusion regularization (FFR) method, addressing inherent classic pyramid networks. These innovations are integrated into three modular, plug‐and‐play components—namely, OA, OMA, FFR modules—ensuring that can be seamlessly incorporated existing frameworks without requiring modifications. To evaluate effectiveness proposed ODDL‐Net, two variants different ViT architectures: Next (NViT) small Swin (SwinT) tiny model, both used as backbones. Experimental results NWPU10, DIOR20, MAR20, GLH‐Bridge datasets demonstrate ODDL‐Net achieve impressive accuracy, surpassing 23 state‐of‐the‐art introduced since 2023. Specifically, ODDL‐Net‐NViT attained accuracies 78.3% challenging DIOR20 dataset 61.4% dataset. Notably, represents substantial improvement approximately 23% over R‐CNN‐ResNet50 baseline conclusion, study demonstrates ViTs well suited high‐accuracy RSIs. Furthermore, it provides straightforward solution building ViT‐based detectors, offering practical approach requires little modification.

Язык: Английский

Процитировано

A multi-scale small object detection algorithm SMA-YOLO for UAV remote sensing images DOI

Shilong Zhou,

Haijin Zhou,

Lei Qian

и другие.

Scientific Reports, Год журнала: 2025, Номер 15(1)

Опубликована: Март 18, 2025

Detecting small objects in complex remote sensing environments presents significant challenges, including insufficient extraction of local spatial information, rigid feature fusion, and limited global representation. In addition, improving model performance requires a delicate balance between accuracy managing computational complexity. To address these we propose the SMA-YOLO algorithm. First, introduce Non-Semantic Sparse Attention (NSSA) mechanism backbone network, which efficiently extracts non-semantic features related to task, thus model's sensitivity objects. throat, design Bidirectional Multi-Branch Auxiliary Feature Pyramid Network (BIMA-FPN), integrates high-level semantic information with low-level details, object detection while expanding multi-scale receptive fields. Finally, incorporate Channel-Space Fusion Adaptive Head (CSFA-Head), fully handles adaptively consistency problems different scales, further robustness scenarios. Experimental results on VisDrone2019 dataset show that achieves 13% improvement mAP compared baseline model, demonstrating exceptional adaptability tasks for imagery. These provide valuable insights new approaches advance research this area.

Язык: Английский

Процитировано

Variance Consistency Learning: Enhancing Cross-Modal Knowledge Distillation for Remote Sensing Image Classification DOI

Huaxiang Song, Yong Zhou,

Wanbo Liu

и другие.

Annals of Emerging Technologies in Computing, Год журнала: 2024, Номер 8(4), С. 56 - 76

Опубликована: Окт. 1, 2024

Vision Transformers (ViTs) have demonstrated exceptional accuracy in classifying remote sensing images (RSIs). However, existing knowledge distillation (KD) methods for transferring representations from a large ViT to more compact Convolutional Neural Network (CNN) proven ineffective. This limitation significantly hampers the remarkable generalization capability of ViTs during deployment due their substantial size. Contrary common beliefs, we argue that domain discrepancies along with RSI inherent natures constrain effectiveness and efficiency cross-modal transfer. Consequently, propose novel Variance Consistency Learning (VCL) strategy enhance KD process, implemented through plug-and-plug module within ViTteachingCNN pipeline. We evaluated our student model, termed VCL-Net, on three datasets. The results reveal VCL-Net exhibits superior size compared 33 other state-of-the-art published past years. Specifically, surpasses KD-based maximum improvement 22% across different Furthermore, visualization analysis model activations reveals has learned long-range dependencies features teacher. Moreover, ablation experiments suggest method reduced time costs process by at least 75%. Therefore, study offers effective efficient approach transfer when addressing discrepancies.

Язык: Английский

Процитировано

Applications of knowledge distillation in remote sensing: A survey DOI

Yassine Himeur, Nour Aburaed, Omar Elharrouss

и другие.

Information Fusion, Год журнала: 2024, Номер unknown, С. 102742 - 102742

Опубликована: Окт. 1, 2024

Язык: Английский

Процитировано

Efficient deep learning-based tomato leaf disease detection through global and local feature fusion DOI

Hao Sun, Rui Fu, Xuewei Wang

и другие.

BMC Plant Biology, Год журнала: 2025, Номер 25(1)

Опубликована: Март 11, 2025

In the context of intelligent agriculture, tomato cultivation involves complex environments, where leaf occlusion and small disease areas significantly impede performance detection models. To address these challenges, this study proposes an efficient Tomato Disease Detection Network (E-TomatoDet), which enhances effectiveness by integrating amplifying global local feature perception capabilities. First, CSWinTransformer (CSWinT) is integrated into backbone network, substantially improving diseases' feature-capturing capacity. Second, a Comprehensive Multi-Kernel Module (CMKM) designed to effectively incorporate large, medium, capturing branches learn multi-scale features diseases. Moreover, Local Feature Enhance Pyramid (LFEP) neck network developed based on CMKM module, integrates across different layers acquire more comprehensive diseases, thereby targets at various scales under backgrounds. Finally, proposed model's was validated two datasets. Notably, dataset, E-TomatoDet improved mean Average Precision (mAP50) 4.7% compared baseline model, reaching 97.2% surpassing advanced real-time YOLOv10s. This research provides effective solution for efficiently detecting vegetable pests issues.

Язык: Английский

Процитировано

An efficient fire detection algorithm based on Mamba space state linear attention DOI

Yuming Li,

Yongjie Wang, Xiaorui Shao

и другие.

Scientific Reports, Год журнала: 2025, Номер 15(1)

Опубликована: Апрель 2, 2025

As an emerging State Space Model (SSM), the Mamba model draws inspiration from architecture of Recurrent Neural Networks (RNNs), significantly enhancing global receptive field and feature extraction capabilities object detection models. Compared to traditional Convolutional (CNNs) Transformers, demonstrates superior performance in handling complex scale variations multi-view interference, making it particularly suitable for tasks dynamic environments such as fire scenarios. To enhance visual technologies provide a novel approach, this paper proposes efficient algorithm based on YOLOv9 introduces multiple key techniques design high-performance leveraging attention mechanism. First, presents mechanism, Efficient Attention (EMA) module. Unlike existing self-attention mechanisms, EMA integrates adaptive average pooling with SSM module, eliminating need full-scale association computations across all positions. Instead, performs dimensionality reduction input features through utilizes state update mechanism module representation optimize information flow. Second, address limitations models local modeling, study incorporates ConvNeXtV2 backbone network, improving model's ability capture fine-grained details thereby strengthening its overall capability. Additionally, non-monotonic focusing distance penalty strategy are employed refine loss function, leading substantial improvement bounding box accuracy. Experimental results demonstrate proposed method tasks. The achieves FPS 71, [Formula: see text] 91.0% large-scale dataset 87.2% small-scale dataset. methods, approach maintains high while exhibiting significant computational efficiency advantages.

Язык: Английский

Процитировано

Hybrid Attention Spike Transformer DOI

Xiongfei Fan, Yujiao Zhang, Yu Zhang

и другие.

IET Cyber-Systems and Robotics, Год журнала: 2025, Номер 7(1)

Опубликована: Янв. 1, 2025

ABSTRACT Spike transformers cannot be pretrained due to objective factors such as lack of datasets and memory constraints, which results in a significant performance gap compared artificial neural networks (ANNs), thereby hindering their practical applicability. To address this issue, we propose hybrid attention spike transformer that utilises self‐attention with compound tokens channel attention‐based token processing better capture the inductive biases data. We also add convolution patch splitting feedforward networks, not only provides local information but leverages translation invariance locality convolutions help model converge. Experiments on static neuromorphic demonstrate our method achieves state‐of‐the‐art spiking (SNNs) field. Notably, achieve top‐1 accuracy 80.59% CIFAR‐100 4 time steps. As far know, it is first exploration multiattention fusion, achieving outstanding effectiveness.

Язык: Английский

Процитировано

Class-adaptive attention transfer and multilevel entropy decoupled knowledge distillation DOI

X. L. Lu, Zhanquan Sun, Chuntao Zou

и другие.

Multimedia Systems, Год журнала: 2025, Номер 31(3)

Опубликована: Апрель 15, 2025

Язык: Английский

Процитировано