Cited by Application of Multi-modal Large Models in Electronic Circuit Image Automatic Annotation and Caption Generation Models

Towards a self-cognitive complex product design system: A fine-grained multi-modal feature recognition and semantic understanding approach using large language models in mechanical engineering DOI

Xinxin Liang,

Zuoxu Wang, Jihong Liu

et al.

Advanced Engineering Informatics, Journal Year: 2025, Volume and Issue: 65, P. 103265 - 103265

Published: March 23, 2025

Language: Английский

Citations

Flexible wearable electronics for enhanced human-computer interaction and virtual reality applications DOI

Jian Li, Yuliang Zhao, Yibo Fan

et al.

Nano Energy, Journal Year: 2025, Volume and Issue: 138, P. 110821 - 110821

Published: March 5, 2025

Language: Английский

Citations

SDA-Net: A Spatially Optimized Dual-Stream Network with Adaptive Global Attention for Building Extraction in Multi-Modal Remote Sensing Images DOI

Xuran Pan,

Kexing Xu,

Shuhao Yang

et al.

Sensors, Journal Year: 2025, Volume and Issue: 25(7), P. 2112 - 2112

Published: March 27, 2025

Building extraction plays a pivotal role in enabling rapid and accurate construction of urban maps, thereby supporting planning, smart city development, management. Buildings remote sensing imagery exhibit diverse morphological attributes spectral signatures, yet their reliable interpretation through single-modal data remains constrained by heterogeneous terrain conditions, occlusions, spatially variable illumination effects inherent to complex geographical landscapes. The integration multi-modal for building offers significant advantages leveraging complementary features from sources. However, the heterogeneity complicates effective feature extraction, while multi-scale cross-modal fusion encounters semantic gap issue. To address these challenges, novel network based on called SDA-les (AGAFMs) was designed decoding stage fuse at various scales, which dynamically adjust importance global perspective better balance information. superior performance proposed method is demonstrated comprehensive evaluations ISPRS Potsdam dataset with 97.66% F1 score 95.42% IoU, Vaihingen 96.56% 93.35% DFC23 Track2 91.35% 84.08% IoU.

Language: Английский

Citations

Predicting dissolved oxygen in water areas using transfer learning and visual information from real-time surveillance videos DOI

Jihong Wang,

Yituo Zhang,

Chaolin Li

et al.

Journal of Cleaner Production, Journal Year: 2025, Volume and Issue: unknown, P. 145547 - 145547

Published: April 1, 2025

Language: Английский

Citations

CLDM-MMNNs: Cross-layer defense mechanisms through multi-modal neural networks fusion for end-to-end cybersecurity—Issues, challenges, and future directions DOI

Sijjad Ali, Jia Wang,

Victor Chung Ming Leung

et al.

Information Fusion, Journal Year: 2025, Volume and Issue: unknown, P. 103222 - 103222

Published: April 1, 2025

Language: Английский

Citations

Research on Park Perception and Understanding Methods Based on Multimodal Text–Image Data and Bidirectional Attention Mechanism DOI

Kai Chen, Xiuhong Lin, Tao Xia

et al.

Buildings, Journal Year: 2025, Volume and Issue: 15(9), P. 1552 - 1552

Published: May 4, 2025

Parks are an important component of urban ecosystems, yet traditional research often relies on single-modal data, such as text or images alone, making it difficult to comprehensively and accurately capture the complex emotional experiences visitors their relationships with environment. This study proposes a park perception understanding model based multimodal text–image data bidirectional attention mechanism. By integrating image incorporates encoder representations from transformers (BERT)-based feature extraction module, Swin Transformer-based cross-attention fusion enabling more precise assessment visitors’ in parks. Experimental results show that compared methods residual network (ResNet), recurrent neural (RNN), long short-term memory (LSTM), proposed achieves significant advantages across multiple evaluation metrics, including mean squared error (MSE), absolute (MAE), root (RMSE), coefficient determination (R2). Furthermore, using SHapley Additive exPlanations (SHAP) method, this identified key factors influencing experiences, “water”, “green”, “sky”, providing scientific basis for management optimization.

Language: Английский

Citations

Demystifying Sensor Fusion and Multi-Modal Perception in Robotics DOI

Prashansha Srivastava

European Journal of Computer Science and Information Technology, Journal Year: 2025, Volume and Issue: 13(26), P. 76 - 90

Published: April 15, 2025

Sensor fusion and multi-modal perception have evolved beyond simple data combination into dynamic, context-aware systems that fundamentally transform how robots understand their environment. Modern autonomous now actively adapt sensing strategies based on environmental conditions, sensor health, task requirements. By integrating from cameras, LiDAR, radar, inertial measurement units, these achieve robust performance even when individual sensors encounter worst-case scenarios. The evolution of deep learning-based architectures addresses critical challenges in temporal synchronization, drift compensation, adaptation through dynamic weighting real-time calibration adjustment. Through edge computing distributed processing, innovations enable reliable operation across industrial automation, navigation, object tracking applications. shift static to represents a crucial advance making practical for real-world deployment.

Language: Английский

Citations

A comprehensive survey on intrusion detection algorithms DOI

Yang Li, Zhengming Li, Mengyao Li

et al.

Computers & Electrical Engineering, Journal Year: 2024, Volume and Issue: 121, P. 109863 - 109863

Published: Nov. 23, 2024

Language: Английский

Citations

Advances in computer AI-assisted multimodal data fusion techniques DOI

Fan Pan, Qiang Wu

Applied Mathematics and Nonlinear Sciences, Journal Year: 2024, Volume and Issue: 9(1)

Published: Jan. 1, 2024

Abstract Through the integration of multimodal data fusion technology and computer AI technology, people’s needs for intelligent life can be better met. This paper introduces alignment perception algorithm fusion, which is based on combining model. Taking air pollutant concentration prediction as an example, time series obtained through LSTM model prediction, attention mechanism introduced to establish numerical pollution. Different stations are also selected acquire weather image data, TS-Conv-LSTM spatio-temporal quality images constructed by utilizing Conv-LSTM cell encoder, then TransConv-LSTM cell, integrates anti-convolution long-short-term memory network a decoder. The Gaussian regression was used combine models, thus achieving synergistic concentrations. RMSE ATT-LSTM dataset reduced 8.03 compared comparison model, predictive fit above 0.75 all R² values. lowest MAE value collaborative only 3.815, highest up 0.985. Introducing deep learning techniques into helps explore massive more deeply obtain comprehensive reliable information about it.

Language: Английский

Citations

Application of Multi-modal Large Models in Electronic Circuit Image Automatic Annotation and Caption Generation Models DOI

Tengfei Wan,

H. K. Liu,

Lijie Geng

et al.

Published: Nov. 22, 2024

Language: Английский

Citations