Epistasis facilitates functional evolution in an ancient transcription factor DOI Creative Commons
Brian D. Metzger, Yeonwoo Park, Tyler N. Starr

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: April 20, 2023

A protein’s genetic architecture – the set of causal rules by which its sequence produces functions also determines possible evolutionary trajectories. Prior research has proposed that proteins is very complex, with pervasive epistatic interactions constrain evolution and make function difficult to predict from sequence. Most this work analyzed only direct paths between two interest excluding vast majority genotypes trajectories considered a single protein function, leaving unaddressed functional specificity impact on new functions. Here we develop method based ordinal logistic regression directly characterize global determinants multiple 20-state combinatorial deep mutational scanning (DMS) experiments. We use it dissect transcription factor’s for DNA, using data DMS an ancient steroid hormone receptor’s capacity activate biologically relevant DNA elements. show recognition consists dense main pairwise effects involve virtually every amino acid state in protein-DNA interface, but higher-order epistasis plays tiny role. Pairwise enlarge sequences are primary different They massively expand number opportunities single-residue mutations switch one target another. By bringing variants close together space, therefore facilitates rather than constrains

Language: Английский

Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering DOI Creative Commons
Jason Yang, Francesca-Zhoufan Li, Frances H. Arnold

et al.

ACS Central Science, Journal Year: 2024, Volume and Issue: 10(2), P. 226 - 241

Published: Feb. 5, 2024

Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even unlock new activities not found in nature. Because search space possible proteins is vast, enzyme engineering usually involves discovering an starting point that has some desired activity followed by directed evolution improve its "fitness" for a application. Recently, machine learning (ML) emerged powerful tool complement this empirical process. ML models contribute (1) discovery functional annotation known protein or generating novel with functions (2) navigating fitness landscapes optimization mappings between associated values. In Outlook, we explain how complements discuss future potential improved outcomes.

Language: Английский

Citations

72

Epistasis facilitates functional evolution in an ancient transcription factor DOI Creative Commons
Brian P. H. Metzger, Yeonwoo Park, Tyler N. Starr

et al.

eLife, Journal Year: 2024, Volume and Issue: 12

Published: May 20, 2024

A protein’s genetic architecture – the set of causal rules by which its sequence produces functions also determines possible evolutionary trajectories. Prior research has proposed that proteins is very complex, with pervasive epistatic interactions constrain evolution and make function difficult to predict from sequence. Most this work analyzed only direct paths between two interest excluding vast majority genotypes trajectories considered a single protein function, leaving unaddressed functional specificity impact on new functions. Here, we develop method based ordinal logistic regression directly characterize global determinants multiple 20-state combinatorial deep mutational scanning (DMS) experiments. We use it dissect transcription factor’s for DNA, using data DMS an ancient steroid hormone receptor’s capacity activate biologically relevant DNA elements. show recognition consists dense main pairwise effects involve virtually every amino acid state in protein-DNA interface, but higher-order epistasis plays tiny role. Pairwise enlarge sequences are primary different They massively expand number opportunities single-residue mutations switch one target another. By bringing variants close together space, therefore facilitates rather than constrains

Language: Английский

Citations

9

Site saturation mutagenesis of 500 human protein domains reveals the contribution of protein destabilization to genetic disease DOI Creative Commons

Antoni Beltran,

Xuege Jiang, Yue Shen

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: April 29, 2024

Abstract Missense variants that change the amino acid sequences of proteins cause one third human genetic diseases 1 . Tens millions missense exist in current population, with vast majority having unknown functional consequences. Here we present first large-scale experimental analysis across many different proteins. Using DNA synthesis and cellular selection experiments quantify impact >500,000 on abundance >500 protein domains. This dataset, Human Domainome 1, reveals >60% pathogenic reduce stability. The contribution stability to fitness varies diseases, is particularly important recessive disorders. Combining measurements language models annotates sites Mutational effects are largely conserved homologous domains, allowing accurate prediction entire families using energy models. demonstrates feasibility assaying at scale provides a large consistent reference dataset for clinical variant interpretation training benchmarking computational methods.

Language: Английский

Citations

8

MoCHI: neural networks to fit interpretable models and quantify energies, energetic couplings, epistasis and allostery from deep mutational scanning data DOI Creative Commons
André J. Faure, Ben Lehner

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 24, 2024

Abstract The massively parallel nature of deep mutational scanning (DMS) allows the quantification phenotypic effects thousands perturbations in a single experiment. We have developed MoCHI, software tool that parameterisation arbitrarily complex models using DMS data. MoCHI simplifies task building custom from measurements mutant on any number phenotypes. It inference free energy changes, as well pairwise and higher-order interaction terms (energetic couplings) for specified biophysical models. When suitable user-specified mechanistic model is not available, global nonlinearities (epistasis) can be estimated directly also builds upon leverages theory ensemble (or background-averaged) epistasis to learn sparse predictive incorporate epistatic are informative genetic architecture underlying biological system. combination performed at scale, including construction complete allosteric maps proteins. freely available ( https://github.com/lehner-lab/MoCHI ) implemented an easy-to-use python package relying PyTorch machine learning framework.

Language: Английский

Citations

7

Epistasis facilitates functional evolution in an ancient transcription factor DOI Creative Commons
Brian P. H. Metzger, Yeonwoo Park, Tyler N. Starr

et al.

eLife, Journal Year: 2023, Volume and Issue: 12

Published: July 12, 2023

A protein's genetic architecture - the set of causal rules by which its sequence produces functions also determines possible evolutionary trajectories. Prior research has proposed that proteins is very complex, with pervasive epistatic interactions constrain evolution and make function difficult to predict from sequence. Most this work analyzed only direct paths between two interest excluding vast majority genotypes trajectories considered a single protein function, leaving unaddressed functional specificity impact on new functions. Here, we develop method based ordinal logistic regression directly characterize global determinants multiple 20-state combinatorial deep mutational scanning (DMS) experiments. We use it dissect transcription factor's for DNA, using data DMS an ancient steroid hormone receptor's capacity activate biologically relevant DNA elements. show recognition consists dense main pairwise effects involve virtually every amino acid state in protein-DNA interface, but higher-order epistasis plays tiny role. Pairwise enlarge sequences are primary different They massively expand number opportunities single-residue mutations switch one target another. By bringing variants close together space, therefore facilitates rather than constrains

Language: Английский

Citations

12

Protein codes promote selective subcellular compartmentalization DOI Creative Commons
Henry R. Kilgore, Itamar Chinn, Peter G. Mikhael

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: April 17, 2024

Abstract Cells have evolved mechanisms to distribute ∼10 billion protein molecules subcellular compartments where diverse proteins involved in shared functions must efficiently assemble. Here, we demonstrate that with share amino acid sequence codes guide them compartment destinations. A language model, ProtGPS, was developed predicts high performance the localization of human excluded from training set. ProtGPS successfully guided generation novel sequences selectively assemble targeted compartments. also identified pathological mutations change this code and lead altered proteins. Our results indicate contain not only a folding code, but previously unrecognized governing their distribution specific cellular

Language: Английский

Citations

4

Genetics, energetics and allostery during a billion years of hydrophobic protein core evolution DOI Creative Commons
Albert Escobedo,

Gesa Voigt,

André J. Faure

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: May 12, 2024

Abstract Protein folding is driven by the burial of hydrophobic amino acids in a tightly-packed core that excludes water. The genetics, biophysics and evolution cores are not well understood, part because lack systematic experimental data on sequence combinations do - constitute stable functional cores. Here we randomize protein evaluate their stability function at scale. show vast numbers acid can but these alternative frequently disrupt allosteric effects. These strong effects due to complicated, highly epistatic fitness landscapes rather, pervasive nature allostery, with many individually small energy changes combining function. Indeed both ligand binding be accurately predicted over very large evolutionary distances using additive models contribution from pairwise energetic couplings. As result, trained one predict across hundreds millions years evolution, only rare couplings experimentally identify limiting transplantation between diverged proteins. Our results reveal simple architecture suggest allostery major constraint evolution.

Language: Английский

Citations

3

For antibody sequence generative modeling, mixture models may be all you need DOI Creative Commons
Jonathan Parkinson, Wei Wang

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 30, 2024

ABSTRACT Antibody therapeutic candidates must exhibit not only tight binding to their target but also good developability properties, especially low risk of immunogenicity. In this work, we fit a simple generative model, SAM, sixty million human heavy and seventy light chains. We show that the probability sequence calculated by model distinguishes sequences from other species with same or better accuracy on variety benchmark datasets containing >400 than any in literature, outperforming large language models (LLMs) margins. SAM can humanize sequences, generate new score for humanness. It is both fast fully interpretable. Our results highlight importance using as baselines protein engineering tasks. additionally introduce tool numbering antibody which orders magnitude faster existing tools literature. Both these are available at https://github.com/Wang-lab-UCSD/AntPack .

Language: Английский

Citations

1

Epistasis facilitates functional evolution in an ancient transcription factor DOI Open Access
Brian D. Metzger, Yeonwoo Park, Tyler N. Starr

et al.

Published: March 22, 2024

A protein’s genetic architecture – the set of causal rules by which its sequence produces functions also determines possible evolutionary trajectories. Prior research has proposed that proteins is very complex, with pervasive epistatic interactions constrain evolution and make function difficult to predict from sequence. Most this work analyzed only direct paths between two interest excluding vast majority genotypes trajectories considered a single protein function, leaving unaddressed functional specificity impact on new functions. Here we develop method based ordinal logistic regression directly characterize global determinants multiple 20-state combinatorial deep mutational scanning (DMS) experiments. We use it dissect transcription factor’s for DNA, using data DMS an ancient steroid hormone receptor’s capacity activate biologically relevant DNA elements. show recognition consists dense main pairwise effects involve virtually every amino acid state in protein-DNA interface, but higher-order epistasis plays tiny role. Pairwise enlarge sequences are primary different They massively expand number opportunities single-residue mutations switch one target another. By bringing variants close together space, therefore facilitates rather than constrains

Language: Английский

Citations

1

GMMA Can Stabilize Proteins Across Different Functional Constraints DOI
Nicolas Daffern, Kristoffer E. Johansson, Zachary T. Baumer

et al.

Journal of Molecular Biology, Journal Year: 2024, Volume and Issue: 436(11), P. 168586 - 168586

Published: April 23, 2024

Language: Английский

Citations

1