Machine Learning Empowering Drug Discovery: Applications, Opportunities and Challenges DOI Creative Commons
Xin Qi,

Yuanchun Zhao,

Zhuang Qi

et al.

Molecules, Journal Year: 2024, Volume and Issue: 29(4), P. 903 - 903

Published: Feb. 18, 2024

Drug discovery plays a critical role in advancing human health by developing new medications and treatments to combat diseases. How accelerate the pace reduce costs of drug has long been key concern for pharmaceutical industry. Fortunately, leveraging advanced algorithms, computational power biological big data, artificial intelligence (AI) technology, especially machine learning (ML), holds promise making hunt drugs more efficient. Recently, Transformer-based models that have achieved revolutionary breakthroughs natural language processing sparked era their applications discovery. Herein, we introduce latest ML discovery, highlight potential models, discuss future prospects challenges field.

Language: Английский

ProtGPT2 is a deep unsupervised language model for protein design DOI Creative Commons
Noelia Ferruz, Steffen Schmidt, Birte Höcker

et al.

Nature Communications, Journal Year: 2022, Volume and Issue: 13(1)

Published: July 27, 2022

Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential tackle many environmental and biomedical problems. Recent progress in Transformer-based architectures has enabled implementation of language models capable generating text with human-like capabilities. Here, motivated by this success, we describe ProtGPT2, a model trained on protein space that generates de novo sequences following principles natural ones. The generated display amino acid propensities, while disorder predictions indicate 88% ProtGPT2-generated are globular, line sequences. Sensitive sequence searches databases show ProtGPT2 distantly related ones, similarity networks further demonstrate is sampling unexplored regions space. AlphaFold prediction ProtGPT2-sequences yields well-folded non-idealized structures embodiments large loops reveals topologies not captured current structure databases. matter seconds freely available.

Language: Английский

Citations

433

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models DOI Creative Commons

Vineet Thumuluri,

José Juan Almagro Armenteros, Alexander Rosenberg Johansen

et al.

Nucleic Acids Research, Journal Year: 2022, Volume and Issue: 50(W1), P. W228 - W234

Published: April 19, 2022

The prediction of protein subcellular localization is great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization and improvements in both performance interpretability. For training validation, curate eukaryotic human multi-location datasets stringent homology partitioning enriched sorting signal information compiled from literature. We achieve state-of-the-art 2.0 by using a pre-trained language model. It has further advantage that it uses sequence input rather than relying on slower profiles. provide two means better interpretability: attention output along highly accurate nine different types signals. find correlates well position webserver available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

Language: Английский

Citations

425

Genome-wide prediction of disease variant effects with a deep protein language model DOI Creative Commons
Nadav Brandes,

Grant Goldman,

Charlotte H. Wang

et al.

Nature Genetics, Journal Year: 2023, Volume and Issue: 55(9), P. 1512 - 1522

Published: Aug. 10, 2023

Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all due to dependency on close homologs or software limitations. Here we developed workflow using ESM1b, 650-million-parameter protein language model, predict ~450 million possible missense in human genome, and made predictions available web portal. ESM1b outperformed existing methods classifying ~150,000 ClinVar/HGMD as pathogenic benign predicting measurements across 28 deep mutational scan datasets. We further annotated ~2 damaging only specific isoforms, demonstrating importance considering isoforms when effects. Our approach also generalizes more complex such in-frame indels stop-gains. Together, these results establish an effective, accurate general

Language: Английский

Citations

221

BepiPred‐3.0: Improved B‐cell epitope prediction using protein language models DOI Creative Commons
Joakim Nøddeskov Clifford, Magnus Haraldson Høie, Sebastian Deleuran

et al.

Protein Science, Journal Year: 2022, Volume and Issue: 31(12)

Published: Nov. 11, 2022

B-cell epitope prediction tools are of great medical and commercial interest due to their practical applications in vaccine development disease diagnostics. The introduction protein language models (LMs), trained on unprecedented large datasets sequences structures, tap into a powerful numeric representation that can be exploited accurately predict local global structural features from amino acid only. In this paper, we present BepiPred-3.0, sequence-based tool that, by exploiting LM embeddings, greatly improves the accuracy for both linear conformational several independent test sets. Furthermore, carefully selecting additional input variables residue annotation strategy, performance was further improved, thus achieving predictive power. Our epitopes across hundreds minutes. It is freely available as web server standalone package at https://services.healthtech.dtu.dk/service.php?BepiPred-3.0 with user-friendly interface navigate results.

Language: Английский

Citations

124

ProteInfer, deep neural networks for protein functional inference DOI Creative Commons
Theo Sanderson, Maxwell L. Bileschi, David Belanger

et al.

eLife, Journal Year: 2023, Volume and Issue: 12

Published: Feb. 27, 2023

Predicting the function of a protein from its amino acid sequence is long-standing challenge in bioinformatics. Traditional approaches use alignment to compare query either thousands models families or large databases individual sequences. Here we introduce ProteInfer, which instead employs deep convolutional neural networks directly predict variety functions – Enzyme Commission (EC) numbers and Gene Ontology (GO) terms an unaligned sequence. This approach provides precise predictions complement alignment-based methods, computational efficiency single network permits novel lightweight software interfaces, demonstrate with in-browser graphical interface for prediction all computation performed on user’s personal computer no data uploaded remote servers. Moreover, these place full-length sequences into generalised functional space, facilitating downstream analysis interpretation. To read interactive version this paper, please visit https://google-research.github.io/proteinfer/ .

Language: Английский

Citations

111

Transformer-based deep learning for predicting protein properties in the life sciences DOI Creative Commons
Abel Chandra, Laura Tünnermann, Tommy Löfstedt

et al.

eLife, Journal Year: 2023, Volume and Issue: 12

Published: Jan. 18, 2023

Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough life science applications, particular protein property prediction. There is hope that learning can close the gap between proteins and known properties based on lab experiments. Language models from field natural language processing gained popularity for predictions new computational revolution biology, where old prediction results are being improved regularly. Such learn useful multipurpose representations large open repositories sequences be used, instance, predict properties. The growing quickly because class model-the Transformer model. We review recent use large-scale applications predicting characteristics how such used predict, example, post-translational modifications. shortcomings other explain proven very promising way unravel information hidden amino acids.

Language: Английский

Citations

98

Applications of transformer-based language models in bioinformatics: a survey DOI Creative Commons
Shuang Zhang, Rui Fan, Yuti Liu

et al.

Bioinformatics Advances, Journal Year: 2023, Volume and Issue: 3(1)

Published: Jan. 1, 2023

Abstract Summary The transformer-based language models, including vanilla transformer, BERT and GPT-3, have achieved revolutionary breakthroughs in the field of natural processing (NLP). Since there are inherent similarities between various biological sequences languages, remarkable interpretability adaptability these models prompted a new wave their application bioinformatics research. To provide timely comprehensive review, we introduce key developments by describing detailed structure transformers summarize contribution to wide range research from basic sequence analysis drug discovery. While applications diverse multifaceted, identify discuss common challenges, heterogeneity training data, computational expense model interpretability, opportunities context We hope that broader community NLP researchers, bioinformaticians biologists will be brought together foster future development inspire novel unattainable traditional methods. Supplementary information data available at Bioinformatics Advances online.

Language: Английский

Citations

95

Pre-trained Language Models in Biomedical Domain: A Systematic Survey DOI Open Access
Benyou Wang, Qianqian Xie, Jiahuan Pei

et al.

ACM Computing Surveys, Journal Year: 2023, Volume and Issue: 56(3), P. 1 - 52

Published: Aug. 1, 2023

Pre-trained language models (PLMs) have been the de facto paradigm for most natural processing tasks. This also benefits biomedical domain: researchers from informatics, medicine, and computer science communities propose various PLMs trained on datasets, e.g., text, electronic health records, protein, DNA sequences However, cross-discipline characteristics of hinder their spreading among communities; some existing works are isolated each other without comprehensive comparison discussions. It is nontrivial to make a survey that not only systematically reviews recent advances in applications but standardizes terminology benchmarks. article summarizes progress pre-trained domain downstream Particularly, we discuss motivations introduce key concepts models. We then taxonomy categorizes them perspectives systematically. Plus, tasks exhaustively discussed, respectively. Last, illustrate limitations future trends, which aims provide inspiration research.

Language: Английский

Citations

94

Machine learning for functional protein design DOI
Pascal Notin, Nathan Rollins, Yarin Gal

et al.

Nature Biotechnology, Journal Year: 2024, Volume and Issue: 42(2), P. 216 - 228

Published: Feb. 1, 2024

Language: Английский

Citations

94

Accelerating the integration of ChatGPT and other large‐scale AI models into biomedical research and healthcare DOI Creative Commons

Ding‐Qiao Wang,

Long‐Yu Feng,

Jinguo Ye

et al.

MedComm – Future Medicine, Journal Year: 2023, Volume and Issue: 2(2)

Published: May 17, 2023

Abstract Large‐scale artificial intelligence (AI) models such as ChatGPT have the potential to improve performance on many benchmarks and real‐world tasks. However, it is difficult develop maintain these because of their complexity resource requirements. As a result, they are still inaccessible healthcare industries clinicians. This situation might soon be changed advancements in graphics processing unit (GPU) programming parallel computing. More importantly, leveraging existing large‐scale AIs GPT‐4 Med‐PaLM integrating them into multiagent (e.g., Visual‐ChatGPT) will facilitate implementations. review aims raise awareness applications healthcare. We provide general overview several advanced AI models, including language vision‐language graph learning language‐conditioned multimodal embodied models. discuss medical addition challenges future directions. Importantly, we stress need align with human values goals, using reinforcement from feedback, ensure that accurate personalized insights support decision‐making outcomes.

Language: Английский

Citations

84