M3SOT: Multi-Frame, Multi-Field, Multi-Space 3D Single Object Tracking DOI Open Access
Jiaming Liu, Yue Wu, Maoguo Gong

et al.

Proceedings of the AAAI Conference on Artificial Intelligence, Journal Year: 2024, Volume and Issue: 38(4), P. 3630 - 3638

Published: March 24, 2024

3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving. Sparse and occluded data in scene point clouds introduce variations the appearance tracked objects, adding complexity to task. In this research, we unveil M3SOT, novel SOT framework, which synergizes multiple input frames (template sets), receptive fields (continuous contexts), solution spaces (distinct tasks) ONE model. Remarkably, M3SOT pioneers modeling temporality, contexts, tasks directly from clouds, revisiting perspective on key factors influencing SOT. To end, design transformer-based network centered cloud targets search area, aggregating diverse contextual representations propagating target cues by employing historical frames. As spans varied processing perspectives, we've streamlined network—trimming its depth optimizing structure—to ensure lightweight efficient deployment applications. We posit that, backed practical construction, sidesteps need complex frameworks auxiliary components deliver sterling results. Extensive experiments benchmarks such as KITTI, nuScenes, Waymo Open Dataset demonstrate that achieves state-of-the-art performance at 38 FPS. Our code models are available https://github.com/ywu0912/TeamCode.git.

Language: Английский

Self-Supervised Intra-Modal and Cross-Modal Contrastive Learning for Point Cloud Understanding DOI
Yue Wu, Jiaming Liu, Maoguo Gong

et al.

IEEE Transactions on Multimedia, Journal Year: 2023, Volume and Issue: 26, P. 1626 - 1638

Published: June 9, 2023

Learning effective representations from unlabeled data is a challenging task for point cloud understanding. As the human visual system can map concepts learned 2D images to 3D world, and inspired by recent multimodal research, we introduce modality image joint learning. Based on properties of clouds images, propose CrossNet, comprehensive intra- cross-modal contrastive learning method that learns representations. The proposed achieves 3D-3D 3D-2D correspondences objectives maximizing consistency their augmented versions, with corresponding rendered in invariant space. We further distinguish into RGB grayscale extract color geometric features, respectively. These training combine feature between modalities rich signals images. Our CrossNet simple: add extraction module projection head branches, respectively, train backbone network self-supervised manner. After pretrained, only required fine-tuning directly predicting results downstream tasks. experiments multiple benchmarks demonstrate improved classification segmentation results, be generalized across domains.

Language: Английский

Citations

28

Learning Discriminative Features via Multi-Hierarchical Mutual Information for Unsupervised Point Cloud Registration DOI
Yongzhe Yuan, Yue Wu, Mingyu Yue

et al.

IEEE Transactions on Circuits and Systems for Video Technology, Journal Year: 2024, Volume and Issue: 34(9), P. 8343 - 8354

Published: March 19, 2024

Extracting discriminative representations is the key step for correspondence-free point cloud registration. The extracted require to be transformation, which demands reduce influence of redundant information irrelevant transformation. However, recently proposed methods ignore this crucial property, resulting in limited ability represent cloud. In addition, researching registration has stagnated recent years. paper, we try relieve features redundancy issue from a new perspective. Specifically, our method comprises two stages: feature extraction stage and rigid body transformation stage. stage, aim maximize multi-hierarchical mutual between different hierarchical features, can provide less regress parameters next utilize dual quaternion estimate parameters, combines rotation translation simultaneously within unified framework obtains compact model trained an unsupervised manner on ModelNet40 dataset. experimental results illustrate that achieves higher accuracy robustness compared with existing methods.

Language: Английский

Citations

12

Joint Semantic Segmentation using representations of LiDAR point clouds and camera images DOI

Yue Wu,

Jiaming Liu, Maoguo Gong

et al.

Information Fusion, Journal Year: 2024, Volume and Issue: 108, P. 102370 - 102370

Published: March 20, 2024

Language: Английский

Citations

9

Transformer-based multimodal change detection with multitask consistency constraints DOI Creative Commons
Biyuan Liu,

Huaixin Chen,

Kun Li

et al.

Information Fusion, Journal Year: 2024, Volume and Issue: 108, P. 102358 - 102358

Published: March 24, 2024

Change detection plays a fundamental role in Earth observation for analyzing temporal iterations over time. However, recent studies have largely neglected the utilization of multimodal data that presents significant practical and technical advantages compared to single-modal approaches. This research focuses on leveraging pre-event digital surface model (DSM) post-event aerial images captured at different times detecting change beyond 2D. We observe current methods struggle with multitask conflicts between semantic height tasks. To address this challenge, we propose an efficient Transformer-based network learns shared representation cross-dimensional inputs through cross-attention. It adopts consistency constraint establish relationship. Initially, pseudo-changes are derived by employing thresholding. Subsequently, L2 distance within their overlapping regions is minimized. explicitly endows (regression task) (classification consistency. A DSM-to-image dataset encompassing three cities Netherlands was constructed. lays new foundation beyond-2D from inputs. Compared five state-of-the-art methods, our demonstrates consistent superiority terms detection. Furthermore, strategy can be seamlessly adapted other yielding promising improvements.

Language: Английский

Citations

9

Point cloud registration via sampling-based evolutionary multitasking DOI
Hangqi Ding, Yue Wu, Maoguo Gong

et al.

Swarm and Evolutionary Computation, Journal Year: 2024, Volume and Issue: 89, P. 101535 - 101535

Published: June 24, 2024

Language: Английский

Citations

9

Multi-view rotating machinery fault diagnosis with adaptive co-attention fusion network DOI
Xiaorong Liu, Jie Wang, Sa Meng

et al.

Engineering Applications of Artificial Intelligence, Journal Year: 2023, Volume and Issue: 122, P. 106138 - 106138

Published: March 20, 2023

Language: Английский

Citations

19

Correspondence-Free Point Cloud Registration Via Feature Interaction and Dual Branch [Application Notes] DOI
Yue Wu, Jiaming Liu, Yongzhe Yuan

et al.

IEEE Computational Intelligence Magazine, Journal Year: 2023, Volume and Issue: 18(4), P. 66 - 79

Published: Oct. 17, 2023

Point cloud registration, which effectively coincides the source and target point clouds, is generally implemented by geometric metrics or feature metrics. In terms of resistance to noise outliers, feature-metric registration has less error than traditional point-to-point corresponding metric, reconstruction can generate reveal more potential information during recovery process, further optimize process. this paper, CFNet, a correspondence-free framework based on metrics, proposed learn adaptive representations, with an emphasis optimizing network. Considering correlations among paired clouds in interaction module that perceive strengthen association between multiple stages proposed. To clarify fact rotation translation are essentially uncorrelated, they considered different solution spaces, interactive features divided into two parts produce dual branch regression. addition, CFNet its comprehensive objectives estimates transformation matrix input minimizing loss The extensive experiments conducted both synthetic real-world datasets show our method outperforms existing methods.

Language: Английский

Citations

18

MPCT: Multiscale Point Cloud Transformer With a Residual Network DOI
Yue Wu, Jiaming Liu, Maoguo Gong

et al.

IEEE Transactions on Multimedia, Journal Year: 2023, Volume and Issue: 26, P. 3505 - 3516

Published: Sept. 12, 2023

The self-attention (SA) network revisits the essence of data and has achieved remarkable results in text processing image analysis. SA is conceptualized as a set operator that insensitive to order number data, making it suitable for point sets embedded 3D space. However, working with clouds still poses challenges. To tackle issue exponential growth complexity singularity induced by original without position encoding, we modify attention mechanism incorporating encoding make linear, thus reducing its computational cost memory usage more feasible clouds. This article presents new framework called multiscale cloud transformer (MPCT), which improves upon prior methods cross-domain applications. utilization multiple embeddings enables complete capture remote local contextual connections within clouds, determined our proposed mechanism. Additionally, use residual facilitate fusion features, allowing MPCT better comprehend representations at each stage attention. Experiments conducted on several datasets demonstrate outperforms existing methods, such achieving accuracies 94.2% 84.9% classification tasks implemented ModelNet40 ScanObjectNN, respectively.

Language: Английский

Citations

16

S3L: Spectrum Transformer for Self-Supervised Learning in Hyperspectral Image Classification DOI Creative Commons
Hufeng Guo, Wenyi Liu

Remote Sensing, Journal Year: 2024, Volume and Issue: 16(6), P. 970 - 970

Published: March 10, 2024

In the realm of Earth observation and remote sensing data analysis, advancement hyperspectral imaging (HSI) classification technology is paramount importance. Nevertheless, intricate nature data, coupled with scarcity labeled presents significant challenges in this domain. To mitigate these issues, we introduce a self-supervised learning algorithm predicated on spectral transformer for HSI under conditions limited objective enhancing efficacy classification. The S3L operates two distinct phases: pretraining fine-tuning. During phase, learns spatial representation from unlabeled utilizing masking mechanism transformer, thereby augmenting sequence dependence features. Subsequently, fine-tuning employed to refine pretrained weights, improving precision Within comprehensive encoder–decoder framework, propose novel module specifically engineered synergize feature extraction domain analysis. This innovative adeptly navigates complex interplay among various bands, capturing both global sequential dependencies. Uniquely, it incorporates gated recurrent unit (GRU) layer within encoder enhance its ability process sequences. Our experimental evaluations across several public datasets reveal that our proposed method, distinguished by achieves superior performance, particularly scenarios samples, outperforming existing state-of-the-art approaches.

Language: Английский

Citations

6

PSCLI-TF: Position-Sensitive Cross-Layer Interactive Transformer Model for Remote Sensing Image Scene Classification DOI
Daxiang Li,

Runyuan Liu,

Yao Tang

et al.

IEEE Geoscience and Remote Sensing Letters, Journal Year: 2024, Volume and Issue: 21, P. 1 - 5

Published: Jan. 1, 2024

In the scene classification task of remote sensing image (RSI), in order to fully perceive multi-scale local objects and explore their interdependencies mine semantics RSI, this letter designs a novel Position-Sensitive Cross-Layer Interactive Transformer (PSCLI-TF) model improve accuracy RSI classification. Firstly, ResNet50 is utilized as backbone extract multi-layer feature maps RSI. Then, enhance model's position sensitivity new Attention (PSCLIA) mechanism designed, based on it PSCLI-TF encoder constructed perform layer-by-layer interactive fusion obtain multi-granularity Fusion (CLF) Finally, prototype-based self-supervised loss function alleviate semantic gap problem "large intra-class variance small inter-class variance" Comparative experimental results three datasets (i.e., AID, NWPU UCM) indicate that performance designed highly competitive compared other state-of-the-art methods.

Language: Английский

Citations

5