Accurate prediction of nucleic acid binding proteins using protein language model DOI Creative Commons
Siwen Wu, Jinbo Xu, Jun‐tao Guo

и другие.

Bioinformatics Advances, Год журнала: 2024, Номер 5(1)

Опубликована: Дек. 26, 2024

Abstract Motivation Nucleic acid binding proteins (NABPs) play critical roles in various and essential biological processes. Many machine learning-based methods have been developed to predict different types of NABPs. However, most these studies limited applications predicting the NABPs for any given protein with unknown functions, due several factors such as dataset construction, prediction scope features used training testing. In addition, single-stranded DNA (DBP) (SSBs) not extensively investigated identifying novel SSBs from functions. Results To improve accuracy protein, we hierarchical multi-class models a feature extracted language model ESM2. Our results show that by combining ESM2 learning methods, can achieve high up 95% each stage approach, 85% overall approach. More importantly, besides much improved other NABPs, be accurately DBPs, which is underexplored. Availability implementation The datasets code found at https://figshare.com/projects/Prediction_of_nucleic_acid_binding_proteins_using_protein_language_model/211555.

Язык: Английский

PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models DOI

Lingrong Zhang,

Taigang Liu

International Journal of Biological Macromolecules, Год журнала: 2024, Номер unknown, С. 136147 - 136147

Опубликована: Окт. 1, 2024

Язык: Английский

Процитировано

4

Contribution of DNA breathing to physical interactions with transcription factors DOI Creative Commons

W. R. BUTT,

Ben Lai,

Tsu-Pei Chiu

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Янв. 22, 2025

Interaction between transcription factors (TFs) and DNA plays a key role in regulating gene expression. It is generally believed that these interactions are controlled through recognition of core motifs by TFs. Nevertheless, several studies pointed out the limitation this view, particular, sequence variants influencing TF binding often located outside motifs. One possible explanation physical properties may play TF-DNA interactions. Recent have supported importance shape features, especially flanking regions Another important property breathing, spontaneous opening double-stranded thermal motions. But there been few genomic breathing In work, we analyzed vitro data three TFs found features inside or near correlated with affinity. This suggests prefer locally temporally melted formed breathing. We extended analysis to 44 vivo ChIP-seq data. for large proportion TFs, their associated binding, but sign magnitude associations vary substantially across families. Altogether, our study supports hypothesis contribute Proper regulation when where genes expressed crucial biological development function. process largely interaction sequences. The specific sequences ensure only correct activated. Extensive work has shown bind certain patterns 6-20 bp, known as However, structure molecules also role. explored which refers double strand due creates transient, single-strand "bubbles" DNA. Through examining >60 propensity forming bubbles affinity sequence. Interestingly seem results highlighted potential

Язык: Английский

Процитировано

0

Highly Optimized Simulation of Atomic Resolution Cell-Like Protein Environment DOI

Andrii M. Tytarenko,

Amar Singh,

Vineeth Kumar Ambati

и другие.

The Journal of Physical Chemistry B, Год журнала: 2025, Номер unknown

Опубликована: Март 12, 2025

Computational approaches can provide details of molecular mechanisms in a crowded environment inside cells. Protein docking predicts stable configurations complexes, which correspond to deep energy minima. Systematic approaches, such as those based on fast Fourier transform (FFT), also map the entire intermolecular landscape by determining position and depth full spectrum Such mapping allows speeding up simulations precalculating values. Our earlier study combined FFT with Monte Carlo protocol, enabling simulation cell-size, protein systems seconds, longer trajectories at atomic resolution, several orders magnitude than achievable alternative approaches. In this study, we present further drastic extension modeling capabilities parallelized implementation protocol. The procedure was applied panel Death Fold Domains that form nucleated polymers human innate immune signaling, recapitulating their homooligomerization tendencies providing insights into polymer nucleation. protocol beyond previously reported implementation, reaching uncharted territory resolution cell-sized systems.

Язык: Английский

Процитировано

0

A Comprehensive Review of Computational Methods for Protein-DNA Binding Site Prediction DOI
Zi Liu,

Wang-Ren Qiu,

Yan Liu

и другие.

Analytical Biochemistry, Год журнала: 2025, Номер unknown, С. 115862 - 115862

Опубликована: Апрель 1, 2025

Язык: Английский

Процитировано

0

Molecular surfaces modeling: Advancements in deep learning for molecular interactions and predictions DOI

Renjie Xia,

Wei Li, Yi Cheng

и другие.

Biochemical and Biophysical Research Communications, Год журнала: 2025, Номер unknown, С. 151799 - 151799

Опубликована: Апрель 1, 2025

Язык: Английский

Процитировано

0

Special issue: Multiscale simulations of DNA from electrons to nucleosomes DOI
John H. Maddocks, Pablo D. Dans,

Thomas Cheatham

и другие.

Biophysical Reviews, Год журнала: 2024, Номер 16(3), С. 259 - 262

Опубликована: Июнь 1, 2024

Язык: Английский

Процитировано

3

DNAproDB: an updated database for the automated and interactive analysis of protein–DNA complexes DOI Creative Commons
Raktim Mitra, Ari S Cohen, Jared M. Sagendorf

и другие.

Nucleic Acids Research, Год журнала: 2024, Номер 53(D1), С. D396 - D402

Опубликована: Ноя. 4, 2024

DNAproDB (https://dnaprodb.usc.edu/) is a database, visualization tool, and processing pipeline for analyzing structural features of protein-DNA interactions. Here, we present substantially updated version the database through additional annotations, search, user interface functionalities. The update expands number pre-analyzed structures, which are automatically weekly. analysis identifies water-mediated hydrogen bonds that incorporated into visualizations complexes. Tertiary structure-aware nucleotide layouts now available. New file formats external annotations supported. website has been redesigned, interacting with graphs data more intuitive. We also statistical on collection structures revealing salient patterns in

Язык: Английский

Процитировано

2

Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences DOI Creative Commons
Sushmita Basu, Jing Yu, Daisuke Kihara

и другие.

Briefings in Bioinformatics, Год журнала: 2024, Номер 26(1)

Опубликована: Ноя. 22, 2024

Abstract Computational prediction of nucleic acid-binding residues in protein sequences is an active field research, with over 80 methods that were released the past 2 decades. We identify and discuss 87 sequence-based predictors include dozens recently published are surveyed for first time. overview historical progress examine multiple practical issues availability impact predictors, key features their predictive models, important aspects related to training assessment. observe decade has brought increased use deep neural networks language which contributed substantial gains performance. also highlight advancements vital challenging cross-predictions between deoxyribonucleic acid (DNA)-binding ribonucleic (RNA)-binding targeting two distinct sources binding annotations, structure-based versus intrinsic disorder-based. The trained on structure-annotated interactions tend perform poorly disorder-annotated vice versa, only a few target well across both annotation types. significant problem, some DNA-binding or RNA-binding indiscriminately predicting Moreover, we show web servers cited substantially more than tools without implementation no longer working implementations, motivating development long-term maintenance servers. close by discussing future research directions aim drive further this area.

Язык: Английский

Процитировано

0

Accurate prediction of nucleic acid binding proteins using protein language model DOI Creative Commons
Siwen Wu, Jinbo Xu, Jun‐tao Guo

и другие.

Bioinformatics Advances, Год журнала: 2024, Номер 5(1)

Опубликована: Дек. 26, 2024

Abstract Motivation Nucleic acid binding proteins (NABPs) play critical roles in various and essential biological processes. Many machine learning-based methods have been developed to predict different types of NABPs. However, most these studies limited applications predicting the NABPs for any given protein with unknown functions, due several factors such as dataset construction, prediction scope features used training testing. In addition, single-stranded DNA (DBP) (SSBs) not extensively investigated identifying novel SSBs from functions. Results To improve accuracy protein, we hierarchical multi-class models a feature extracted language model ESM2. Our results show that by combining ESM2 learning methods, can achieve high up 95% each stage approach, 85% overall approach. More importantly, besides much improved other NABPs, be accurately DBPs, which is underexplored. Availability implementation The datasets code found at https://figshare.com/projects/Prediction_of_nucleic_acid_binding_proteins_using_protein_language_model/211555.

Язык: Английский

Процитировано

0