Journal of Chemical Information and Modeling, Год журнала: 2025, Номер unknown
Опубликована: Март 4, 2025
To reduce the cost of experimental characterization potential substrates for enzymes, machine learning prediction models offer an alternative solution. Pretrained language models, as powerful approaches protein and molecule representation, have been employed in development enzyme-substrate achieving promising performance. In addition to continuing improvements effectively fusing encoders handle multimodal tasks is critical further enhancing model performance by using available representation methods. Here, we present FusionESP, a architecture that integrates chemistry with two independent projection heads contrastive strategy predicting pairs. Our best achieved state-of-the-art accuracy 94.77% on test data exhibited better generalization capacity while requiring fewer computational resources training data, compared previous studies fine-tuned encoder or employing more encoders. It also confirmed our hypothesis embeddings positive pairs are closer each other high-dimension space, negative exhibit opposite trend. ablation showed played crucial role enhancement, improved heads' classification tasks. The proposed expected be applied enhance additional multimodality biology. A user-friendly web server FusionESP established freely accessible at https://rqkjkgpsyu.us-east-1.awsapprunner.com/.
Язык: Английский