A generative artificial intelligence framework based on a molecular diffusion model for the design of metal-organic frameworks for carbon capture DOI Creative Commons
Hyun Park, Xiaoli Yan, Ruijie Zhu

et al.

Communications Chemistry, Journal Year: 2024, Volume and Issue: 7(1)

Published: Feb. 14, 2024

Abstract Metal-organic frameworks (MOFs) exhibit great promise for CO 2 capture. However, finding the best performing materials poses computational and experimental grand challenges in view of vast chemical space potential building blocks. Here, we introduce GHP-MOFassemble, a generative artificial intelligence (AI), high performance framework rational accelerated design MOFs with adsorption capacity synthesizable linkers. GHP-MOFassemble generates novel linkers, assembled one three pre-selected metal nodes (Cu paddlewheel, Zn tetramer) into primitive cubic topology. screens validates AI-generated uniqueness, synthesizability, structural validity, uses molecular dynamics simulations to study their stability consistency, crystal graph neural networks Grand Canonical Monte Carlo quantify capacities. We present top six capacities greater than 2m mol g −1 , i.e., higher 96.9% structures hypothetical MOF dataset.

Language: Английский

Equivariant 3D-conditional diffusion model for molecular linker design DOI Creative Commons
Ilia Igashov, H. Stärk,

Clément Vignac

et al.

Nature Machine Intelligence, Journal Year: 2024, Volume and Issue: 6(4), P. 417 - 427

Published: April 11, 2024

Abstract Fragment-based drug discovery has been an effective paradigm in early-stage development. An open challenge this area is designing linkers between disconnected molecular fragments of interest to obtain chemically relevant candidate molecules. In work, we propose DiffLinker, E(3)-equivariant three-dimensional conditional diffusion model for linker design. Given a set fragments, our places missing atoms and designs molecule incorporating all the initial fragments. Unlike previous approaches that are only able connect pairs method can link arbitrary number Additionally, automatically determines its attachment points input We demonstrate DiffLinker outperforms other methods on standard datasets, generating more diverse synthetically accessible experimentally test real-world applications, showing it successfully generate valid conditioned target protein pockets.

Language: Английский

Citations

42

Diff-AMP: tailored designed antimicrobial peptide framework with all-in-one generation, identification, prediction and optimization DOI Creative Commons

Rui Wang,

Tao Wang, Linlin Zhuo

et al.

Briefings in Bioinformatics, Journal Year: 2024, Volume and Issue: 25(2)

Published: Jan. 22, 2024

Abstract Antimicrobial peptides (AMPs), short with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due their low drug resistance toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, generation overlooks the complex interdependencies among amino acids. Secondly, current models fail integrate crucial tasks like screening, attribute prediction iterative optimization. Consequently, we develop a integrated framework, Diff-AMP, that automates identification, We innovatively kinetic diffusion attention mechanisms into reinforcement framework efficient generation. Additionally, our module incorporates pre-training transfer strategies precise identification screening. employ convolutional neural network multi-attribute learning-based optimization strategy produce AMPs. This molecule optimization, thereby advancing research. have deployed Diff-AMP on web server, code, data server details available in Data Availability section.

Language: Английский

Citations

25

Scientific Large Language Models: A Survey on Biological & Chemical Domains DOI Open Access
Qiang Zhang, Keyan Ding, Tingting Lv

et al.

ACM Computing Surveys, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 26, 2025

Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing significant stride toward artificial general intelligence. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized systems developed within various scientific disciplines. This growing interest has led to the advent LLMs, novel subclass specifically engineered for facilitating discovery. As burgeoning area community AI Science, warrant comprehensive exploration. However, systematic and up-to-date survey introducing them is currently lacking. In this paper, we endeavor methodically delineate concept “scientific language”, whilst providing thorough review latest advancements LLMs. Given expansive realm disciplines, our analysis adopts focused lens, concentrating on biological chemical domains. includes an in-depth examination textual knowledge, small molecules, macromolecular proteins, genomic sequences, their combinations, analyzing terms model architectures, capabilities, datasets, evaluation. Finally, critically examine prevailing challenges point out promising research directions along with advances By offering overview technical developments field, aspires be invaluable resource researchers navigating intricate landscape

Language: Английский

Citations

4

In silico modeling of targeted protein degradation DOI Creative Commons
Wenxing Lv,

Xiaojuan Jia,

Bowen Tang

et al.

European Journal of Medicinal Chemistry, Journal Year: 2025, Volume and Issue: 289, P. 117432 - 117432

Published: Feb. 20, 2025

Language: Английский

Citations

2

Pre-training Molecular Graph Representation with 3D Geometry DOI Creative Commons
Shengchao Liu, Hanchen Wang, Weiyang Liu

et al.

arXiv (Cornell University), Journal Year: 2021, Volume and Issue: unknown

Published: Jan. 1, 2021

Molecular graph representation learning is a fundamental problem in modern drug and material discovery. graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays more vital role predicting molecular functionalities. However, the lack of real-world scenarios significantly impeded representation. To cope with this challenge, we propose Graph Multi-View Pre-training (GraphMVP) framework where self-supervised (SSL) performed leveraging correspondence consistency between structures views. GraphMVP effectively learns encoder enhanced richer discriminative geometry. We further provide theoretical insights to justify effectiveness GraphMVP. Finally, comprehensive experiments show can consistently outperform existing SSL methods.

Language: Английский

Citations

80

Uni-Mol: A Universal 3D Molecular Representation Learning Framework DOI Creative Commons

Gengmo Zhou,

Zhifeng Gao,

Qiankun Ding

et al.

Published: May 26, 2022

Molecular representation learning (MRL) has gained tremendous attention due to its critical role in from limited supervised data for applications like drug design. In most MRL methods, molecules are treated as 1D sequential tokens or 2D topology graphs, limiting their ability incorporate 3D information downstream tasks and, particular, making it almost impossible geometry prediction generation. Herein, we propose Uni-Mol, a universal framework that significantly enlarges the and application scope of schemes. Uni-Mol is composed two models with same SE(3)-equivariant transformer architecture: molecular pretraining model trained by 209M conformations; pocket 3M candidate protein data. The used independently separate tasks, combined when protein-ligand binding tasks. By properly incorporating information, outperforms SOTA 14/15 property Moreover, achieves superior performance spatial including pose prediction, conformation generation, etc. Finally, show can be successfully applied few-shot druggability prediction. will made publicly available at \url{https://github.com/dptech-corp/Uni-Mol}

Language: Английский

Citations

44

An Ecosystem for Digital Reticular Chemistry DOI Creative Commons
Kevin Maik Jablonka, Andrew Rosen, Aditi S. Krishnapriyan

et al.

ACS Central Science, Journal Year: 2023, Volume and Issue: 9(4), P. 563 - 581

Published: March 10, 2023

The vastness of the materials design space makes it impractical to explore using traditional brute-force methods, particularly in reticular chemistry. However, machine learning has shown promise expediting and guiding design. Despite numerous successful applications materials, progress field stagnated, possibly because digital chemistry is more an art than a science its limited accessibility inexperienced researchers. To address this issue, we present mofdscribe, software ecosystem tailored novice seasoned chemists that streamlines ideation, modeling, publication process. Though optimized for chemistry, our tools are versatile can be used nonreticular research. We believe mofdscribe will enable reliable, efficient, comparable

Language: Английский

Citations

36

Deep Generative Models in De Novo Drug Molecule Generation DOI
Chao Pang, Jianbo Qiao,

Xiangxiang Zeng

et al.

Journal of Chemical Information and Modeling, Journal Year: 2023, Volume and Issue: 64(7), P. 2174 - 2194

Published: Nov. 7, 2023

The discovery of new drugs has important implications for human health. Traditional methods drug rely on experiments to optimize the structure lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence exhibited promising efficient performance drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation molecules with desired properties, showing massive potential novel discovery. this study, we review recent progress using models, mainly focusing representations, public databases, data processing tools, advanced based frameworks. present a comprehensive comparison state-of-the-art summary commonly used molecular design strategies. We identify research gaps challenges such as need better missing 3D information representation, lack high-precision evaluation metrics. suggest future directions

Language: Английский

Citations

36

MDM: Molecular Diffusion Model for 3D Molecule Generation DOI Open Access
Lei Huang,

Hengtong Zhang,

Tingyang Xu

et al.

Proceedings of the AAAI Conference on Artificial Intelligence, Journal Year: 2023, Volume and Issue: 37(4), P. 5105 - 5112

Published: June 26, 2023

Molecule generation, especially generating 3D molecular geometries from scratch (i.e., de novo generation), has become a fundamental task in drug design. Existing diffusion based molecule generation methods could suffer unsatisfactory performances, when large molecules. At the same time, generated molecules lack enough diversity. This paper proposes novel model to address those two challenges. First, interatomic relations are not included molecules' point cloud representations. Thus, it is difficult for existing generative models capture potential forces and abundant local constraints. To tackle this challenge, we propose augment further involve dual equivariant encoders encode of different strengths. Second, diffusion-based essentially shift elements geometry along gradient data density. Such process lacks exploration intermediate steps Langevin dynamics. issue, introduce distributional controlling variable each diffusion/reverse step enforce thorough explorations improve Extensive experiments on multiple benchmarks demonstrate that proposed significantly outperforms both unconditional conditional tasks. We also conduct case studies help understand physicochemical properties The codes available at https://github.com/tencent-ailab/MDM.

Language: Английский

Citations

35

A Systematic Survey of Chemical Pre-trained Models DOI Open Access
Jun Xia, Yanqiao Zhu, Yuanqi Du

et al.

Published: Aug. 1, 2023

Deep learning has achieved remarkable success in representations for molecules, which is crucial various biochemical applications, ranging from property prediction to drug design. However, training Neural Networks (DNNs) scratch often requires abundant labeled are expensive acquire the real world. To alleviate this issue, tremendous efforts have been devoted Chemical Pre-trained Models (CPMs), where DNNs pre-trained using large-scale unlabeled molecular databases and then fine-tuned over specific downstream tasks. Despite prosperity, there lacks a systematic review of fast-growing field. In paper, we present first survey that summarizes current progress CPMs. We highlight limitations representation models motivate CPM studies. Next, systematically recent advances on topic several key perspectives, including descriptors, encoder architectures, pre-training strategies, applications. also challenges promising avenues future research, providing useful resource both machine scientific communities.

Language: Английский

Citations

33