Protein Electrostatic Properties are Fine-Tuned Through Evolution DOI Creative Commons
Mingzhe Shen, Guy W. Dayhoff, Jana Shen

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Апрель 21, 2025

Abstract Protein ionization states provide electrostatic forces to modulate protein structure, stability, solubility, and function. Until now, predicting understanding electrostatics have relied on structural information. Here we demonstrate that primary sequence alone enables remarkably accurate p K a predictions through KaML-ESM, model leverages evolutionary representations from ultra-large language models ESMs pretraining with synthetic dataset. The KaML-ESM achieves RMSEs approaching the experimental precision limit of ∼0.5 pH units for Asp, Glu, His, Lys residues, while reducing Cys prediction errors 1.1 – further improvement expected as training dataset expands. state-of-the-art performance was validated external evaluations, including proteome-wide analysis values. Our results support notation encodes not only structure function but also properties, which may been co-optimized evolution. Lastly, KaML, sequence-based end-to-end ML platform researchers map landscapes, facilitating applications ranging drug design engineering molecular simulations.

Язык: Английский

Protein Electrostatic Properties are Fine-Tuned Through Evolution DOI Creative Commons
Mingzhe Shen, Guy W. Dayhoff, Jana Shen

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Апрель 21, 2025

Abstract Protein ionization states provide electrostatic forces to modulate protein structure, stability, solubility, and function. Until now, predicting understanding electrostatics have relied on structural information. Here we demonstrate that primary sequence alone enables remarkably accurate p K a predictions through KaML-ESM, model leverages evolutionary representations from ultra-large language models ESMs pretraining with synthetic dataset. The KaML-ESM achieves RMSEs approaching the experimental precision limit of ∼0.5 pH units for Asp, Glu, His, Lys residues, while reducing Cys prediction errors 1.1 – further improvement expected as training dataset expands. state-of-the-art performance was validated external evaluations, including proteome-wide analysis values. Our results support notation encodes not only structure function but also properties, which may been co-optimized evolution. Lastly, KaML, sequence-based end-to-end ML platform researchers map landscapes, facilitating applications ranging drug design engineering molecular simulations.

Язык: Английский

Процитировано

0