Large language models in bioinformatics: applications and perspectives DOI Creative Commons
Jiajia Liu,

Mengyuan Yang,

Yankai Yu

и другие.

arXiv (Cornell University), Год журнала: 2024, Номер unknown

Опубликована: Янв. 1, 2024

Large language models (LLMs) are a class of artificial intelligence based on deep learning, which have great performance in various tasks, especially natural processing (NLP). typically consist neural networks with numerous parameters, trained large amounts unlabeled input using self-supervised or semi-supervised learning. However, their potential for solving bioinformatics problems may even exceed proficiency modeling human language. In this review, we will present summary the prominent used processing, such as BERT and GPT, focus exploring applications at different omics levels bioinformatics, mainly including genomics, transcriptomics, proteomics, drug discovery single cell analysis. Finally, review summarizes prospects bioinformatic problems.

Язык: Английский

How will generative AI disrupt data science in drug discovery? DOI
Jean‐Philippe Vert

Nature Biotechnology, Год журнала: 2023, Номер 41(6), С. 750 - 751

Опубликована: Май 8, 2023

Язык: Английский

Процитировано

48

How successful are AI-discovered drugs in clinical trials? A first analysis and emerging lessons DOI Creative Commons

Madura K P Jayatunga,

Margaret Ayers,

Lotte Bruens

и другие.

Drug Discovery Today, Год журнала: 2024, Номер 29(6), С. 104009 - 104009

Опубликована: Апрель 30, 2024

AI techniques are making inroads into the field of drug discovery. As a result, growing number drugs and vaccines have been discovered using AI. However, questions remain about success these molecules in clinical trials. To address questions, we conducted first analysis pipelines AI-native Biotech companies. In Phase I find AI-discovered an 80–90% rate, substantially higher than historic industry averages. This suggests, argue, that is highly capable designing or identifying with drug-like properties. II rate ∼40%, albeit on limited sample size, comparable to Our findings highlight early signs potential for molecules.

Язык: Английский

Процитировано

35

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text DOI
Pengfei Liu, Yiming Ren, Jun Tao

и другие.

Computers in Biology and Medicine, Год журнала: 2024, Номер 171, С. 108073 - 108073

Опубликована: Янв. 30, 2024

Язык: Английский

Процитировано

24

Current and future directions in network biology DOI Creative Commons
Marinka Žitnik, Michelle M. Li, A. V. Wells

и другие.

Bioinformatics Advances, Год журнала: 2024, Номер 4(1)

Опубликована: Янв. 1, 2024

Abstract Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions diseases across systems scales. Although been around for two decades, it remains nascent. It witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably growing complexity volume data together with increased diversity types describing different tiers organization. We discuss prevailing research directions network biology, focusing on molecular/cellular networks but also other such as biomedical knowledge graphs, patient similarity networks, brain social/contact relevant to disease spread. In more detail, we highlight areas inference comparison multimodal integration heterogeneous higher-order analysis, machine learning network-based personalized medicine. Following overview recent breakthroughs these five areas, offer a perspective future biology. Additionally, scientific communities, educational initiatives, importance fostering within field. This article establishes roadmap immediate long-term vision Availability implementation Not applicable.

Язык: Английский

Процитировано

20

Invalid SMILES are beneficial rather than detrimental to chemical language models DOI Creative Commons
Michael A. Skinnider

Nature Machine Intelligence, Год журнала: 2024, Номер 6(4), С. 437 - 448

Опубликована: Март 29, 2024

Abstract Generative machine learning models have attracted intense interest for their ability to sample novel molecules with desired chemical or biological properties. Among these, language trained on SMILES (Simplified Molecular-Input Line-Entry System) representations been subject the most extensive experimental validation and widely adopted. However, these what is perceived be a major limitation: some fraction of strings that they generate are invalid, meaning cannot decoded structure. This shortcoming has motivated remarkably broad spectrum work designed mitigate generation invalid correct them post hoc. Here I provide causal evidence produce outputs not harmful but instead beneficial models. show provides self-corrective mechanism filters low-likelihood samples from model output. Conversely, enforcing valid produces structural biases in generated molecules, impairing distribution limiting generalization unseen space. Together, results refute prevailing assumption reframe as feature, bug.

Язык: Английский

Процитировано

17

Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges DOI
Sukriti Singh, Raghavan B. Sunoj

Accounts of Chemical Research, Год журнала: 2023, Номер 56(3), С. 402 - 412

Опубликована: Янв. 30, 2023

ConspectusIn the domain of reaction development, one aims to obtain higher efficacies as measured in terms yield and/or selectivities. During empirical cycles, an admixture outcomes from low high yields/selectivities is expected. While it not easy identify all factors that might impact efficiency, complex and nonlinear dependence on nature reactants, catalysts, solvents, etc. quite likely. Developmental stages newer reactions would typically offer a few hundreds samples with variations participating molecules conditions. These "observations" their "output" can be harnessed valuable labeled data for developing molecular machine learning (ML) models. Once robust ML model built specific under predict outcome any new choice substrates/catalyst seconds/minutes thus expedite identification promising candidates experimental validation. Recent years have witnessed impressive applications world, most them aimed at predicting important chemical or biological properties. We believe integration effective workflows made richly beneficial discovery.As technology, direct adaptation used well-developed domains, such natural language processing (NLP) image recognition, unlikely succeed discovery. Some challenges stem ineffective featurization space, unavailability quality its distribution, making right technically deployment. It shall noted there no universal suitable inherently high-dimensional problem reactions. Given these backgrounds, rendering tools conducive exciting well challenging endeavor same time. With increased availability efficient algorithms, we focused tapping potential small-data discovery (a thousands samples).In this Account, describe both feature engineering approaches applied diverse contemporary interest. Among these, catalytic asymmetric hydrogenation imines/alkenes, β-C(sp3)–H bond functionalization, relay Heck employed approach using quantum-chemically derived physical organic descriptors features─all designed enantioselectivity. The selection features customize interest described, along emphasizing insights could gathered through use features. Feature methods Buchwald–Hartwig cross-coupling, deoxyfluorination alcohols, enantioselectivity N,S-acetal formation are found excellent predictions. propose transfer protocol, wherein trained large number (105–106) fine-tuned library target task reactions, alternative (102–103 reactions). exploitation deep neural network latent space method generative tasks useful substrates demonstrated strategy.

Язык: Английский

Процитировано

35

cMolGPT: A Conditional Generative Pre-Trained Transformer for Target-Specific De Novo Molecular Generation DOI Creative Commons
Ye Wang, Honggang Zhao, Simone Sciabola

и другие.

Molecules, Год журнала: 2023, Номер 28(11), С. 4430 - 4430

Опубликована: Май 30, 2023

Deep generative models applied to the generation of novel compounds in small-molecule drug design have attracted a lot attention recent years. To that interact with specific target proteins, we propose Generative Pre-Trained Transformer (GPT)-inspired model for de novo target-specific molecular design. By implementing different keys and values multi-head conditional on specified target, proposed method can generate drug-like both without target. The results show our approach (cMolGPT) is capable generating SMILES strings correspond active compounds. Moreover, generated from closely match chemical space real molecules cover significant portion Thus, Conditional valuable tool molecule has potential accelerate optimization cycle time.

Язык: Английский

Процитировано

35

Guided diffusion for inverse molecular design DOI
Tomer Weiss, Eduardo Mayo Yanes, Sabyasachi Chakraborty

и другие.

Nature Computational Science, Год журнала: 2023, Номер 3(10), С. 873 - 882

Опубликована: Окт. 5, 2023

Язык: Английский

Процитировано

31

Data-Driven Elucidation of Flavor Chemistry DOI Creative Commons
Xingran Kou,

Peiqin Shi,

Chukun Gao

и другие.

Journal of Agricultural and Food Chemistry, Год журнала: 2023, Номер 71(18), С. 6789 - 6802

Опубликована: Апрель 27, 2023

Flavor molecules are commonly used in the food industry to enhance product quality and consumer experiences but associated with potential human health risks, highlighting need for safer alternatives. To address these health-associated challenges promote reasonable application, several databases flavor have been constructed. However, no existing studies comprehensively summarized data resources according quality, focused fields, gaps. Here, we systematically 25 molecule published within last 20 years revealed that inaccessibility, untimely updates, nonstandard descriptions main limitations of current studies. We examined development computational approaches (e.g., machine learning molecular simulation) identification novel discussed their major regarding throughput, model interpretability, lack gold-standard sets equitable evaluation. Additionally, future strategies mining designing based on multi-omics artificial intelligence provide a new foundation science research.

Язык: Английский

Процитировано

28

Open-Source Machine Learning in Computational Chemistry DOI Creative Commons
Alexander Hagg, Karl N. Kirschner

Journal of Chemical Information and Modeling, Год журнала: 2023, Номер 63(15), С. 4505 - 4532

Опубликована: Июль 19, 2023

The field of computational chemistry has seen a significant increase in the integration machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within last 5 years, to better understand topics being investigated by approaches. For each project, provide short description, link code, accompanying license type, whether training data resulting models are made publicly available. Based on those deposited GitHub repositories, most popular employed Python libraries identified. We hope that survey will serve as resource learn about or specific architectures thereof identifying accessible codes topic basis. To end, also include for generating fundamental learning. our observations considering three pillars collaborative work, open data, source (code), models, some suggestions community.

Язык: Английский

Процитировано

28