Training data composition determines machine learning generalization and biological rule discovery DOI Creative Commons
Eugen Ursu, Aygul R. Minnegalieva, Puneet Rawat

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: June 19, 2024

Abstract Supervised machine learning models rely on training datasets with positive (target class) and negative examples. Therefore, the composition of dataset has a direct influence model performance. Specifically, sample selection bias, concerning samples not representing target class, presents challenges across range domains such as text classification protein-protein interaction prediction. Machine-learning-based immunotherapeutics design is an increasingly important area research, focusing designing antibodies or T-cell receptors (TCRs) that can bind to their molecules high specificity affinity. Given biomedical importance immunotherapeutics, there need address unresolved question how set impacts generalization biological rule discovery enable rational safe drug design. We out study this in context antibody-antigen binding prediction problem by varying encompassing affinity gradient. based our investigation large synthetic provide ground truth structure-based data, allowing access residue-wise energy interface. found both out-of-distribution depended type used. Importantly, we discovered model’s capacity learn rules trivial correlate its accuracy. confirmed findings real-world relevant experimental data. Our work highlights considering for achieving optimal performance machine-learning-based research. Significance Statement The effectiveness supervised hinges datasets, particularly inclusion This bias greatly impact As development immunotherapeutic agents using becoming crucial biomedicine, understanding imperative. study, focused problem, reveals choice significantly affects These underscore necessity carefully machine-learning-driven research performance, robustness meaningful acquisition.

Language: Английский

BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning DOI Creative Commons
David Příhoda,

Jad Maamary,

Andrew B. Waight

et al.

mAbs, Journal Year: 2022, Volume and Issue: 14(1)

Published: Feb. 8, 2022

Despite recent advances in transgenic animal models and display technologies, humanization of mouse sequences remains one the main routes for therapeutic antibody development. Traditionally, is manual, laborious, requires expert knowledge. Although automation efforts are advancing, existing methods either demonstrated on a small scale or entirely proprietary. To predict immunogenicity risk, human-likeness can be evaluated using humanness scores, but these lack diversity, granularity interpretability. Meanwhile, immune repertoire sequencing has generated rich libraries such as Observed Antibody Space (OAS) that offer augmented diversity not yet exploited engineering. Here we present BioPhi, an open-source platform featuring novel (Sapiens) evaluation (OASis). Sapiens deep learning method trained OAS language modeling. Based

Language: Английский

Citations

113

Computational and artificial intelligence-based methods for antibody development DOI Creative Commons
Ji‐Sun Kim, Matthew McFee,

Qiao Fang

et al.

Trends in Pharmacological Sciences, Journal Year: 2023, Volume and Issue: 44(3), P. 175 - 189

Published: Jan. 18, 2023

Due to their high target specificity and binding affinity, therapeutic antibodies are currently the largest class of biotherapeutics. The traditional largely empirical antibody development process is, while mature robust, cumbersome has significant limitations. Substantial recent advances in computational artificial intelligence (AI) technologies now starting overcome many these limitations increasingly integrated into pipelines. Here, we provide an overview AI methods relevant for development, including databases, predictors properties structure, design with emphasis on machine learning (ML) models, complementarity-determining region (CDR) loops, structural components critical binding.

Language: Английский

Citations

101

Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain DOI Creative Commons
Joseph M. Taft, Cédric R. Weber, Beichen Gao

et al.

Cell, Journal Year: 2022, Volume and Issue: 185(21), P. 4008 - 4022.e14

Published: Aug. 31, 2022

The continual evolution of SARS-CoV-2 and the emergence variants that show resistance to vaccines neutralizing antibodies threaten prolong COVID-19 pandemic. Selection are driven in part by mutations within viral spike protein particular ACE2 receptor-binding domain (RBD), a primary target site for antibodies. Here, we develop deep mutational learning (DML), machine-learning-guided engineering technology, which is used investigate massive sequence space combinatorial mutations, representing billions RBD variants, accurately predicting their impact on binding antibody escape. A highly diverse landscape possible identified could emerge from multitude evolutionary trajectories. DML may be predictive profiling current prospective including mutated such as Omicron, thus guiding development therapeutic treatments COVID-19.

Language: Английский

Citations

85

Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies DOI Creative Commons
Rahmad Akbar, Habib Bashour, Puneet Rawat

et al.

mAbs, Journal Year: 2022, Volume and Issue: 14(1)

Published: March 16, 2022

Although the therapeutic efficacy and commercial success of monoclonal antibodies (mAbs) are tremendous, design discovery new candidates remain a time cost-intensive endeavor. In this regard, progress in generation data describing antigen binding developability, computational methodology, artificial intelligence may pave way for era silico on-demand immunotherapeutics discovery. Here, we argue that main necessary machine learning (ML) components an mAb sequence generator are: understanding rules mAb-antigen binding, capacity to modularly combine parameters, algorithms unconstrained parameter-driven synthesis. We review current toward realization these discuss challenges must be overcome allow ML-based fit-for-purpose candidates.

Language: Английский

Citations

83

IgLM: Infilling language modeling for antibody sequence design DOI Creative Commons
Richard W. Shuai, Jeffrey A. Ruffolo, Jeffrey J. Gray

et al.

Cell Systems, Journal Year: 2023, Volume and Issue: 14(11), P. 979 - 989.e4

Published: Oct. 30, 2023

Language: Английский

Citations

63

Assessing developability early in the discovery process for novel biologics DOI Creative Commons
Monica L. Fernández‐Quintero, Anne Ljungars, Franz Waibl

et al.

mAbs, Journal Year: 2023, Volume and Issue: 15(1)

Published: Feb. 23, 2023

Beyond potency, a good developability profile is key attribute of biological drug. Selecting and screening for such attributes early in the drug development process can save resources avoid costly late-stage failures. Here, we review some most important properties that be assessed on biologics. These include influence source biologic, its biophysical pharmacokinetic properties, how well it expressed recombinantly. We furthermore present silico, vitro, vivo methods techniques exploited at different stages discovery to identify molecules with liabilities thereby facilitate selection optimal leads. Finally, reflect relevant parameters injectable versus orally delivered biologics provide an outlook toward what general trends are expected rise

Language: Английский

Citations

57

Development and use of machine learning algorithms in vaccine target selection DOI Creative Commons
Barbara Bravi

npj Vaccines, Journal Year: 2024, Volume and Issue: 9(1)

Published: Jan. 20, 2024

Computer-aided discovery of vaccine targets has become a cornerstone rational design. In this article, I discuss how Machine Learning (ML) can inform and guide key computational steps in design concerned with the identification B T cell epitopes correlates protection. provide examples ML models, as well types data predictions for which they are built. argue that interpretable potential to improve immunogens also tool scientific discovery, by helping elucidate molecular processes underlying vaccine-induced immune responses. outline limitations challenges terms availability method development need be addressed bridge gap between advances their translational application

Language: Английский

Citations

35

Applications of machine learning in antibody discovery, process development, manufacturing and formulation: Current trends, challenges, and opportunities DOI Creative Commons
Thanh Tung Khuat, Robert Bassett, Ellen Otte

et al.

Computers & Chemical Engineering, Journal Year: 2024, Volume and Issue: 182, P. 108585 - 108585

Published: Jan. 11, 2024

While machine learning (ML) has made significant contributions to the biopharmaceutical field, its applications are still in early stages terms of providing direct support for quality-by-design based development and manufacturing biologics, hindering enormous potential bioprocesses automation from their manufacturing. However, adoption ML-based models instead conventional multivariate data analysis methods is significantly increasing due accumulation large-scale production data. This trend primarily driven by real-time monitoring process variables quality attributes products through implementation advanced analytical technologies. Given complexity multidimensionality a bioproduct design, bioprocess development, product data, approaches increasingly being employed achieve accurate, flexible, high-performing predictive address problems analytics, monitoring, control within biopharma field. paper aims provide comprehensive review current ML solutions control, optimisation upstream, downstream, formulation processes monoclonal antibodies. Finally, this thoroughly discusses main challenges related themselves, use antibody Moreover, it offers further insights into innovative novel trends new digital solutions.

Language: Английский

Citations

32

Machine-designed biotherapeutics: opportunities, feasibility and advantages of deep learning in computational antibody discovery DOI Creative Commons

Wiktoria Wilman,

Sonia Wróbel,

Weronika Bielska

et al.

Briefings in Bioinformatics, Journal Year: 2022, Volume and Issue: 23(4)

Published: July 13, 2022

Antibodies are versatile molecular binders with an established and growing role as therapeutics. Computational approaches to developing designing these molecules being increasingly used complement traditional lab-based processes. Nowadays, in silico methods fill multiple elements of the discovery stage, such characterizing antibody-antigen interactions identifying developability liabilities. Recently, computational tackling problems have begun follow machine learning paradigms, many cases deep specifically. This paradigm shift offers improvements areas structure or binding prediction opens up new possibilities language-based modeling antibody repertoires machine-learning-based generation novel sequences. In this review, we critically examine recent developments (deep) therapeutic design implications for fully design.

Language: Английский

Citations

61

Toward real-world automated antibody design with combinatorial Bayesian optimization DOI Creative Commons
Asif Khan, Alexander I. Cowen-Rivers,

Antoine Grosnit

et al.

Cell Reports Methods, Journal Year: 2023, Volume and Issue: 3(1), P. 100374 - 100374

Published: Jan. 1, 2023

Antibodies are multimeric proteins capable of highly specific molecular recognition. The complementarity determining region 3 the antibody variable heavy chain (CDRH3) often dominates antigen-binding specificity. Hence, it is a priority to design optimal antigen-specific CDRH3 develop therapeutic antibodies. combinatorial structure sequences makes impossible query binding-affinity oracles exhaustively. Moreover, antibodies expected have high target specificity and developability. Here, we present AntBO, Bayesian optimization framework utilizing trust for an in silico with favorable developability scores. experiments on 159 antigens demonstrate that AntBO step toward practically viable vitro design. In under 200 calls oracle, suggests outperforming best binding sequence from 6.9 million experimentally obtained CDRH3s. Additionally, finds very-high-affinity only 38 protein designs while requiring no domain knowledge.

Language: Английский

Citations

27