Cited by Unveiling novel antimicrobial peptides from the ruminant gastrointestinal microbiomes: A deep learning-driven approach yields an anti-MRSA candidate

Machine Learning-Guided Protein Engineering DOI

Petr Kouba, Pavel Kohout, Faraneh Haddadi

и другие.

ACS Catalysis, Год журнала: 2023, Номер 13(21), С. 13863 - 13895

Опубликована: Окт. 13, 2023

Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid the discovery annotation of enzymes, as well suggesting beneficial mutations for improving known targets. The field protein is gathering steam, driven by recent success stories notable other areas. It already encompasses ambitious tasks such understanding predicting structure function, catalytic efficiency, enantioselectivity, dynamics, stability, solubility, aggregation, more. Nonetheless, still evolving, with many challenges overcome questions address. In this Perspective, we provide an overview ongoing trends domain, highlight case studies, examine current limitations learning-based We emphasize crucial importance thorough validation emerging models before their use rational design. present our opinions on fundamental problems outline potential directions future research.

Язык: Английский

Процитировано

pLM4ACE: A protein language model based predictor for antihypertensive peptide screening DOI

Zhenjiao Du, Xingjian Ding, William Hsu

и другие.

Food Chemistry, Год журнала: 2023, Номер 431, С. 137162 - 137162

Опубликована: Авг. 14, 2023

Язык: Английский

Процитировано

Protein codes promote selective subcellular compartmentalization DOI

Henry R. Kilgore, Itamar Chinn, Peter G. Mikhael

и другие.

Science, Год журнала: 2025, Номер unknown

Опубликована: Фев. 6, 2025

Cells have evolved mechanisms to distribute ~10 billion protein molecules subcellular compartments where diverse proteins involved in shared functions must assemble. Here, we demonstrate that with share amino acid sequence codes guide them compartment destinations. A language model, ProtGPS, was developed predicts high performance the localization of human excluded from training set. ProtGPS successfully guided generation novel sequences selectively assemble nucleolus. identified pathological mutations change this code and lead altered proteins. Our results indicate contain not only a folding code, but also previously unrecognized governing their distribution compartments.

Язык: Английский

Процитировано

Leveraging transformers‐based language models in proteome bioinformatics DOI

Nguyen Quoc Khanh Le

PROTEOMICS, Год журнала: 2023, Номер 23(23-24)

Опубликована: Июнь 29, 2023

Abstract In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies structure, function, interactions proteins, is a crucial area bioinformatics. Using natural language processing (NLP) techniques proteomics an emerging field that combines machine learning text mining Recently, transformer‐based NLP models have gained significant attention for their ability process variable‐length input sequences parallel, self‐attention mechanisms capture long‐range dependencies. review paper, we discuss advancements proteome examine advantages, limitations, potential applications improve accuracy efficiency various tasks. Additionally, highlight challenges future directions these research. Overall, provides valuable insights into revolutionize

Язык: Английский

Процитировано

From intuition to AI: evolution of small molecule representations in drug discovery DOI

Miles McGibbon, Steven Shave, Jie Dong

и другие.

Briefings in Bioinformatics, Год журнала: 2023, Номер 25(1)

Опубликована: Ноя. 22, 2023

Abstract Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe efficacious drugs while reducing costs, time failure rates. To achieve this goal, it crucial represent molecules in a digital format makes them machine-readable facilitates accurate prediction properties drive decision-making. Over years, representations have evolved from intuitive human-readable formats bespoke numerical descriptors fingerprints, now learned capture patterns salient features across vast chemical spaces. Among these, sequence-based graph-based small become highly popular. However, each approach has strengths weaknesses dimensions such as generality, computational cost, inversibility for generative applications interpretability, which can be critical informing practitioners’ decisions. As discovery landscape evolves, opportunities innovation continue emerge. These include creation high-value, low-data regimes, distillation broader biological knowledge novel modeling up-and-coming therapeutic modalities.

Язык: Английский

Процитировано

PIXART-$$\Sigma $$: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation DOI

Jun Song Chen, Chongjian Ge, Enze Xie

и другие.

Lecture notes in computer science, Год журнала: 2024, Номер unknown, С. 74 - 91

Опубликована: Ноя. 22, 2024

Язык: Английский

Процитировано

Biophysics-based protein language models for protein engineering DOI

Sam Gelman,

Bryce Johnson,

Chase R. Freschlin

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Март 17, 2024

Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these overlook decades of research into biophysical factors governing We propose Mutational Effect Transfer Learning (METL), a model framework that unites advanced machine learning modeling. Using the METL framework, we pretrain transformer-based neural networks simulation to capture fundamental relationships between energetics. finetune experimental sequence-function harness signals apply them when predicting properties like thermostability, catalytic activity, fluorescence. excels in challenging engineering tasks generalizing from small training sets position extrapolation, although existing methods train remain many types assays. demonstrate METL's ability design functional green fluorescent variants only 64 examples, showcasing potential biophysics-based engineering.

Язык: Английский

Процитировано

Recent advances in data mining and machine learning for enhanced building energy management DOI

Xinlei Zhou,

Han Du,

Shan Xue

и другие.

Energy, Год журнала: 2024, Номер 307, С. 132636 - 132636

Опубликована: Июль 29, 2024

Due to the recent advancements in Internet of Things and data science techniques, a wide range studies have investigated use mining (DM) machine learning (ML) algorithms enhance building energy management (BEM). However, different classes DM ML feature mechanisms capabilities, resulting their distinct roles performance BEM. Appropriate integration categories BEM is essential promote application provide guidance for new topic areas. This study presents literature review techniques key areas BEM, including evaluation, usage prediction, demand flexibility optimization. The categorizes into three main categories, supervised DM, unsupervised reinforcement (RL). Unsupervised are primarily used assessment, while mainly employed benchmarking prediction. RL has been utilized optimal control improve efficiency, flexibility, indoor thermal comfort. strengths, shortcomings, these methods terms applications discussed, along with some suggestions future research this field.

Язык: Английский

Процитировано

Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures DOI

Anna Carbery, Martin Buttenschoen, R. Skyner

и другие.

Journal of Cheminformatics, Год журнала: 2024, Номер 16(1)

Опубликована: Март 14, 2024

Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of novel protein interest. However, most methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime insufficient to understand performance on targets where experimental not available. An alternative option provide computationally predicted structures, but this commonly tested. due training data used, computationally-predicted tend be extremely accurate, often biased toward holo conformation. In study we describe benchmark IF-SitePred, protein-ligand method which based labelling ESM-IF1 language model embeddings combined with point cloud annotation clustering. We show that only IF-SitePred competitive state-of-the-art when predicting sites it performs better proxies proteins low accuracy has been simulated molecular dynamics. Finally, outperforms other if ensembles generated.

Язык: Английский

Процитировано

Molecular Dynamics (MD)-Derived Features for Canonical and Noncanonical Amino Acids DOI

Tiffani Hui, Maxim Secor, Minh Ngoc Ho

и другие.

Journal of Chemical Information and Modeling, Год журнала: 2025, Номер unknown

Опубликована: Фев. 2, 2025

Machine learning (ML) models have become increasingly popular for predicting and designing structures properties of peptides proteins. These ML typically use proteins containing only canonical amino acids as the training data. Consequently, these struggle to make accurate predictions new that are absent in data set (e.g., noncanonical acids). One approach improve accuracy is collect more with desired acids. However, this strategy suboptimal may not be easily attainable, additional time required retrain models. Alternatively, extendibility can improved if acid features used representative generalizable unseen Herein, we develop using molecular dynamics (MD) simulation results. Specifically, a given acid, perform MD its dipeptide create based on backbone (ϕ, ψ) distributions electrostatic potentials. We demonstrate enable our accurately predict structural ensembles cyclic present original set. For example, build pentapeptide structures, library 15 test same 15-amino-acid or an extended 50-amino-acid library. When such Morgan fingerprints MACCS keys represent acids, achieve R2 = 0.963 pentapeptides models' performances decrease significantly 0.430 0.508, respectively, when tasked 50 On other hand, model outperforms those keys, 0.700. Overall, instead having data, peptide sequences originally at mere cost performing simulations

Язык: Английский

Процитировано