Machine learning-guided strategies for reaction conditions design and optimization DOI Creative Commons
Lung-Yi Chen, Yi‐Pei Li

Beilstein Journal of Organic Chemistry, Год журнала: 2024, Номер 20, С. 2476 - 2492

Опубликована: Окт. 4, 2024

This review surveys the recent advances and challenges in predicting optimizing reaction conditions using machine learning techniques. The paper emphasizes importance of acquiring processing large diverse datasets chemical reactions, use both global local models to guide design synthetic processes. Global exploit information from comprehensive databases suggest general for new while fine-tune specific parameters a given family improve yield selectivity. also identifies current limitations opportunities this field, such as data quality availability, integration high-throughput experimentation. demonstrates how combination engineering, science, ML algorithms can enhance efficiency effectiveness design, enable novel discoveries chemistry.

Язык: Английский

Molecular contrastive learning of representations via graph neural networks DOI
Yuyang Wang, Jianren Wang, Zhonglin Cao

и другие.

Nature Machine Intelligence, Год журнала: 2022, Номер 4(3), С. 279 - 287

Опубликована: Март 3, 2022

Язык: Английский

Процитировано

478

Machine Learning in Chemical Engineering: Strengths, Weaknesses, Opportunities, and Threats DOI Creative Commons
Maarten R. Dobbelaere, Pieter Plehiers, Ruben Van de Vijver

и другие.

Engineering, Год журнала: 2021, Номер 7(9), С. 1201 - 1211

Опубликована: Июль 29, 2021

Chemical engineers rely on models for design, research, and daily decision-making, often with potentially large financial safety implications. Previous efforts a few decades ago to combine artificial intelligence chemical engineering modeling were unable fulfill the expectations. In last five years, increasing availability of data computational resources has led resurgence in machine learning-based research. Many recent have facilitated roll-out learning techniques research field by developing databases, benchmarks, representations applications new frameworks. Machine significant advantages over traditional techniques, including flexibility, accuracy, execution speed. These strengths also come weaknesses, such as lack interpretability these black-box models. The greatest opportunities involve using time-limited real-time optimization planning that require high accuracy can build self-learning ability recognize patterns, learn from data, become more intelligent time. threat today is inappropriate use because most had limited training computer science analysis. Nevertheless, will definitely trustworthy element toolbox engineers.

Язык: Английский

Процитировано

216

Machine Learning Methods for Small Data Challenges in Molecular Science DOI

Bozheng Dou,

Zailiang Zhu,

Ekaterina Merkurjev

и другие.

Chemical Reviews, Год журнала: 2023, Номер 123(13), С. 8736 - 8780

Опубликована: Июнь 29, 2023

Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, technical limitations acquisition. However, big have been focus for past decade, small their challenges received little attention, even though they technically more severe machine learning (ML) deep (DL) studies. Overall, challenge is compounded by issues, diversity, imputation, noise, imbalance, high-dimensionality. Fortunately, current era characterized technological breakthroughs ML, DL, artificial intelligence (AI), which enable data-driven discovery, many advanced ML DL technologies developed inadvertently provided solutions problems. As a result, significant progress has made decade. In this review, we summarize analyze several emerging potential molecular science, including chemical biological sciences. We review both basic algorithms, linear regression, logistic regression (LR),

Язык: Английский

Процитировано

199

Artificial intelligence for natural product drug discovery DOI
Michael W. Mullowney, Katherine Duncan, Somayah S. Elsayed

и другие.

Nature Reviews Drug Discovery, Год журнала: 2023, Номер 22(11), С. 895 - 916

Опубликована: Сен. 11, 2023

Язык: Английский

Процитировано

167

Rhea, the reaction knowledgebase in 2022 DOI Creative Commons
Parit Bansal, Anne Morgat, Kristian B. Axelsen

и другие.

Nucleic Acids Research, Год журнала: 2021, Номер 50(D1), С. D693 - D700

Опубликована: Ноя. 9, 2021

Abstract Rhea (https://www.rhea-db.org) is an expert-curated knowledgebase of biochemical reactions based on the chemical ontology ChEBI (Chemical Entities Biological Interest) (https://www.ebi.ac.uk/chebi). In this paper, we describe a number key developments in since our last report database issue Nucleic Acids Research 2019. These include improved reaction coverage Rhea, adoption as reference vocabulary for enzyme annotation UniProt UniProtKB (https://www.uniprot.org), development new website, and designation ELIXIR Core Data Resource. We hope that these other will enhance utility resource to study engineer enzymes metabolic systems which they function.

Язык: Английский

Процитировано

165

Chemformer: a pre-trained transformer for computational chemistry DOI Creative Commons

Ross Irwin,

Spyridon Dimitriadis,

Jiazhen He

и другие.

Machine Learning Science and Technology, Год журнала: 2021, Номер 3(1), С. 015022 - 015022

Опубликована: Дек. 7, 2021

Abstract Transformer models coupled with a simplified molecular line entry system (SMILES) have recently proven to be powerful combination for solving challenges in cheminformatics. These models, however, are often developed specifically single application and can very resource-intensive train. In this work we present the Chemformer model—a Transformer-based model which quickly applied both sequence-to-sequence discriminative cheminformatics tasks. Additionally, show that self-supervised pre-training improve performance significantly speed up convergence on downstream On direct synthesis retrosynthesis prediction benchmark datasets publish state-of-the-art results top-1 accuracy. We also existing approaches optimisation task optimise multiple tasks simultaneously. Models, code will made available after publication.

Язык: Английский

Процитировано

150

Attention is all you need: utilizing attention in AI-enabled drug discovery DOI Creative Commons
Yang Zhang, Caiqi Liu, Mujiexin Liu

и другие.

Briefings in Bioinformatics, Год журнала: 2023, Номер 25(1)

Опубликована: Ноя. 22, 2023

Abstract Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance interpretability handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based advantages discovery. We further elaborate on applications various aspects development, from molecular screening target binding property prediction molecule generation. Finally, we discuss current challenges faced application mechanisms Artificial Intelligence technologies, including quality, model computational resource constraints, along with future directions for research. Given accelerating pace technological advancement, believe that will increasingly prominent role anticipate these usher revolutionary breakthroughs pharmaceutical domain, significantly development.

Язык: Английский

Процитировано

133

Artificial intelligence in drug discovery: applications and techniques DOI
Jianyuan Deng, Zhibo Yang, Iwao Ojima

и другие.

Briefings in Bioinformatics, Год журнала: 2021, Номер 23(1)

Опубликована: Сен. 21, 2021

Artificial intelligence (AI) has been transforming the practice of drug discovery in past decade. Various AI techniques have used many applications, such as virtual screening and design. In this survey, we first give an overview on discuss related which can be reduced to two major tasks, i.e. molecular property prediction molecule generation. We then present common data resources, representations benchmark platforms. As a part are dissected into model architectures learning paradigms. To reflect technical development over years, surveyed works organized chronologically. expect that survey provides comprehensive review discovery. also provide GitHub repository with collection papers (and codes, if applicable) resource, is regularly updated.

Язык: Английский

Процитировано

120

polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics DOI Creative Commons

Christopher Kuenneth,

Rampi Ramprasad

Nature Communications, Год журнала: 2023, Номер 14(1)

Опубликована: Июль 11, 2023

Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well significant challenges to identify suitable application-specific candidates. We present complete end-to-end machine-driven polymer informatics pipeline can search this space for candidates at speed and accuracy. This includes fingerprinting capability called polyBERT (inspired by Natural Language Processing concepts), multitask learning approach maps the fingerprints host properties. linguist treats structure polymers language. The outstrips best presently available concepts property prediction based on handcrafted fingerprint schemes in two orders magnitude while preserving accuracy, thus making strong candidate deployment scalable architectures including cloud infrastructures.

Язык: Английский

Процитировано

108

Mole-BERT: Rethinking Pre-training Graph Neural Networks for Molecules DOI Creative Commons
Jun Xia, Chengshuai Zhao, Bozhen Hu

и другие.

Опубликована: Апрель 13, 2023

Recent years have witnessed the prosperity of pre-training graph neural networks (GNNs) for molecules. Typically, atom types as node attributes are randomly masked and GNNs then trained to predict in AttrMask \citep{hu2020strategies}, following Masked Language Modeling (MLM) task BERT~\citep{devlin2019bert}. However, unlike MLM where vocabulary is large, does not learn informative molecular representations due small unbalanced `vocabulary'. To amend this problem, we propose a variant VQ-VAE~\citep{van2017neural} context-aware tokenizer encode into chemically meaningful discrete codes. This can enlarge size mitigate quantitative divergence between dominant (e.g., carbons) rare atoms phosphorus). With enlarged `vocabulary', novel node-level task, dubbed Atoms (MAM), mask some codes pre-train them. MAM also mitigates another issue AttrMask, namely negative transfer. It be easily combined with various tasks improve their performance. Furthermore, triplet contrastive learning (TMCL) graph-level model heterogeneous semantic similarity molecules effective molecule retrieval. TMCL constitute framework, Mole-BERT, which match or outperform state-of-the-art methods fully data-driven manner. We release code at \textcolor{magenta}{\url{https://github.com/junxia97/Mole-BERT}}.

Язык: Английский

Процитировано

67