Unbiased learning of protein conformational representation via unsupervised random forest DOI Creative Commons

Mohammad Sahil,

Navjeet Ahalawat, Jagannath Mondal

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 1, 2024

Accurate data representation is paramount in biophysics to capture the functionally relevant motions of biomolecules. Traditional feature selection methods, while effective, often rely on labeled based prior knowledge and user-supervision, limiting their applicability novel systems. Here, we present unsupervised random forest (URF), a self-supervised adaptation traditional forests that identifies critical features biomolecules without requiring labels. By devising memory-efficient implementation, first demonstrate URF's capability learn important sets inter-residue protein subsequently resolve its complex conformational landscape, performing at par or surpassing supervised counterpart 15 other leading baseline methods. Crucially, URF supplemented by an internal metric, learning coefficient , which automates process hyper-parameter optimization, making method robust user-friendly. remarkable ability unbiased fashion was validated against 10 independent systems including both folded intrinsically disordered states. In particular, benchmarking investigations showed representations identified are meaningful comparison current state-of-the-art deep As application, show can be seamlessly integrated with downstream analyses pipeline such as Markov state models attain better resolved outputs. The investigation presented here establishes tool for biophysics.

Language: Английский

Acceleration with Interpretability: A Surrogate Model-Based Collective Variable for Enhanced Sampling DOI

Sompriya Chatterjee,

Dhiman Ray

Journal of Chemical Theory and Computation, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 4, 2025

Most enhanced sampling methods facilitate the exploration of molecular free energy landscapes by applying a bias potential along reduced dimensional collective variable (CV) space. The success these depends on ability CVs to follow relevant slow modes system. Intuitive CVs, such as distances or contacts, often prove inadequate, particularly in biological systems involving many coupled degrees freedom. Machine learning algorithms, especially neural networks (NN), can automate process CV discovery combining large number descriptors and outperform intuitive efficiency. However, their lack interpretability high cost evaluation during trajectory propagation make NN-CVs difficult apply biomolecular processes. Here, we introduce surrogate model approach using lasso regression express output network linear combination an automatically chosen subset input descriptors. We demonstrate successful applications our simulation conformational landscape alanine dipeptide chignolin mini-protein. In addition providing mechanistic insights due explainable nature, showed negligible loss efficiency accuracy, compared NN-CVs, reconstructing underlying surface. Moreover, simplified functional forms, are better at extrapolating unseen regions space, e.g., saddle points. Surrogate also less expensive evaluate NN counterparts, making them suitable for complex

Language: Английский

Citations

1

Thermodynamics of Self-Assembly and Supramolecular Transitions Using Enhanced Sampling DOI

Zhitong Jiang,

Zachariah Vicars, Suruchi Fialoke

et al.

Langmuir, Journal Year: 2025, Volume and Issue: unknown

Published: June 2, 2025

Computational studies of self-assembly have the potential to provide rich insights into their underlying thermodynamics and identify optimal system conditions for applications such as nanomaterial synthesis or drug delivery. However, both supramolecular transitions can be hindered by free energy barriers, rendering them rare events on molecular time scales making it challenging sample them. Here, we show that use enhanced sampling techniques, when combined with a judiciously chosen set order parameters, offers an efficient robust route characterizing transitions. Specifically, between states different periodicities symmetries reversibly sampled biasing relatively small number Fourier components particle density. We illustrate our approach computing required cleave liquid slab estimating corresponding liquid-vapor surface tension. also characterize energetics transition spherical rod-shaped droplets. These results serve first step toward development systematic computational framework exploring in diverse systems, surfactants block copolymers, self-assembly.

Language: Английский

Citations

0

Unbiased learning of protein conformational representation via unsupervised random forest DOI Creative Commons

Mohammad Sahil,

Navjeet Ahalawat, Jagannath Mondal

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 1, 2024

Accurate data representation is paramount in biophysics to capture the functionally relevant motions of biomolecules. Traditional feature selection methods, while effective, often rely on labeled based prior knowledge and user-supervision, limiting their applicability novel systems. Here, we present unsupervised random forest (URF), a self-supervised adaptation traditional forests that identifies critical features biomolecules without requiring labels. By devising memory-efficient implementation, first demonstrate URF's capability learn important sets inter-residue protein subsequently resolve its complex conformational landscape, performing at par or surpassing supervised counterpart 15 other leading baseline methods. Crucially, URF supplemented by an internal metric, learning coefficient , which automates process hyper-parameter optimization, making method robust user-friendly. remarkable ability unbiased fashion was validated against 10 independent systems including both folded intrinsically disordered states. In particular, benchmarking investigations showed representations identified are meaningful comparison current state-of-the-art deep As application, show can be seamlessly integrated with downstream analyses pipeline such as Markov state models attain better resolved outputs. The investigation presented here establishes tool for biophysics.

Language: Английский

Citations

0