Cited by Application of Multi-modal Large Models in Electronic Circuit Image Automatic Annotation and Caption Generation Models

Towards a self-cognitive complex product design system: A fine-grained multi-modal feature recognition and semantic understanding approach using large language models in mechanical engineering DOI

Xinxin Liang,

Zuoxu Wang, Jihong Liu

и другие.

Advanced Engineering Informatics, Год журнала: 2025, Номер 65, С. 103265 - 103265

Опубликована: Март 23, 2025

Язык: Английский

Процитировано

Flexible wearable electronics for enhanced human-computer interaction and virtual reality applications DOI

Jian Li, Yuliang Zhao, Yibo Fan

и другие.

Nano Energy, Год журнала: 2025, Номер 138, С. 110821 - 110821

Опубликована: Март 5, 2025

Язык: Английский

Процитировано

SDA-Net: A Spatially Optimized Dual-Stream Network with Adaptive Global Attention for Building Extraction in Multi-Modal Remote Sensing Images DOI

Xuran Pan,

Kexing Xu,

Shuhao Yang

и другие.

Sensors, Год журнала: 2025, Номер 25(7), С. 2112 - 2112

Опубликована: Март 27, 2025

Building extraction plays a pivotal role in enabling rapid and accurate construction of urban maps, thereby supporting planning, smart city development, management. Buildings remote sensing imagery exhibit diverse morphological attributes spectral signatures, yet their reliable interpretation through single-modal data remains constrained by heterogeneous terrain conditions, occlusions, spatially variable illumination effects inherent to complex geographical landscapes. The integration multi-modal for building offers significant advantages leveraging complementary features from sources. However, the heterogeneity complicates effective feature extraction, while multi-scale cross-modal fusion encounters semantic gap issue. To address these challenges, novel network based on called SDA-les (AGAFMs) was designed decoding stage fuse at various scales, which dynamically adjust importance global perspective better balance information. superior performance proposed method is demonstrated comprehensive evaluations ISPRS Potsdam dataset with 97.66% F1 score 95.42% IoU, Vaihingen 96.56% 93.35% DFC23 Track2 91.35% 84.08% IoU.

Язык: Английский

Процитировано

Predicting dissolved oxygen in water areas using transfer learning and visual information from real-time surveillance videos DOI

Jihong Wang,

Yituo Zhang,

Chaolin Li

и другие.

Journal of Cleaner Production, Год журнала: 2025, Номер unknown, С. 145547 - 145547

Опубликована: Апрель 1, 2025

Язык: Английский

Процитировано

CLDM-MMNNs: Cross-layer defense mechanisms through multi-modal neural networks fusion for end-to-end cybersecurity—Issues, challenges, and future directions DOI

Sijjad Ali, Jia Wang,

Victor Chung Ming Leung

и другие.

Information Fusion, Год журнала: 2025, Номер unknown, С. 103222 - 103222

Опубликована: Апрель 1, 2025

Язык: Английский

Процитировано

Research on Park Perception and Understanding Methods Based on Multimodal Text–Image Data and Bidirectional Attention Mechanism DOI

Kai Chen, Xiuhong Lin, Tao Xia

и другие.

Buildings, Год журнала: 2025, Номер 15(9), С. 1552 - 1552

Опубликована: Май 4, 2025

Parks are an important component of urban ecosystems, yet traditional research often relies on single-modal data, such as text or images alone, making it difficult to comprehensively and accurately capture the complex emotional experiences visitors their relationships with environment. This study proposes a park perception understanding model based multimodal text–image data bidirectional attention mechanism. By integrating image incorporates encoder representations from transformers (BERT)-based feature extraction module, Swin Transformer-based cross-attention fusion enabling more precise assessment visitors’ in parks. Experimental results show that compared methods residual network (ResNet), recurrent neural (RNN), long short-term memory (LSTM), proposed achieves significant advantages across multiple evaluation metrics, including mean squared error (MSE), absolute (MAE), root (RMSE), coefficient determination (R2). Furthermore, using SHapley Additive exPlanations (SHAP) method, this identified key factors influencing experiences, “water”, “green”, “sky”, providing scientific basis for management optimization.

Язык: Английский

Процитировано

Demystifying Sensor Fusion and Multi-Modal Perception in Robotics DOI

Prashansha Srivastava

European Journal of Computer Science and Information Technology, Год журнала: 2025, Номер 13(26), С. 76 - 90

Опубликована: Апрель 15, 2025

Sensor fusion and multi-modal perception have evolved beyond simple data combination into dynamic, context-aware systems that fundamentally transform how robots understand their environment. Modern autonomous now actively adapt sensing strategies based on environmental conditions, sensor health, task requirements. By integrating from cameras, LiDAR, radar, inertial measurement units, these achieve robust performance even when individual sensors encounter worst-case scenarios. The evolution of deep learning-based architectures addresses critical challenges in temporal synchronization, drift compensation, adaptation through dynamic weighting real-time calibration adjustment. Through edge computing distributed processing, innovations enable reliable operation across industrial automation, navigation, object tracking applications. shift static to represents a crucial advance making practical for real-world deployment.

Язык: Английский

Процитировано

A comprehensive survey on intrusion detection algorithms DOI

Yang Li, Zhengming Li, Mengyao Li

и другие.

Computers & Electrical Engineering, Год журнала: 2024, Номер 121, С. 109863 - 109863

Опубликована: Ноя. 23, 2024

Язык: Английский

Процитировано

Advances in computer AI-assisted multimodal data fusion techniques DOI

Fan Pan, Qiang Wu

Applied Mathematics and Nonlinear Sciences, Год журнала: 2024, Номер 9(1)

Опубликована: Янв. 1, 2024

Abstract Through the integration of multimodal data fusion technology and computer AI technology, people’s needs for intelligent life can be better met. This paper introduces alignment perception algorithm fusion, which is based on combining model. Taking air pollutant concentration prediction as an example, time series obtained through LSTM model prediction, attention mechanism introduced to establish numerical pollution. Different stations are also selected acquire weather image data, TS-Conv-LSTM spatio-temporal quality images constructed by utilizing Conv-LSTM cell encoder, then TransConv-LSTM cell, integrates anti-convolution long-short-term memory network a decoder. The Gaussian regression was used combine models, thus achieving synergistic concentrations. RMSE ATT-LSTM dataset reduced 8.03 compared comparison model, predictive fit above 0.75 all R² values. lowest MAE value collaborative only 3.815, highest up 0.985. Introducing deep learning techniques into helps explore massive more deeply obtain comprehensive reliable information about it.

Язык: Английский

Процитировано

Application of Multi-modal Large Models in Electronic Circuit Image Automatic Annotation and Caption Generation Models DOI

Tengfei Wan,

H. K. Liu,

Lijie Geng

и другие.

Опубликована: Ноя. 22, 2024

Язык: Английский

Процитировано