Enhancing Intrinsically Disordered Region Identification in Proteins: A BERT-Based Deep Learning Approach DOI

Prasanna Kumar B G,

I. R. Oviya,

Fabia U. Battistuzzi

et al.

Published: Dec. 29, 2023

Intrinsically Disordered Regions (IDRs) are pivotal to understanding protein functionality in cellular processes, with significant implications drug discovery and structural biology. These regions recognized for their roles Amino acids Relations, PTMs phase separations. However, traditional experimental methods identifying IDRs time-consuming resource-intensive, while current machine-learning approaches often need improve scalability precision across diverse extensive datasets. In response this challenge, a novel deep learning framework is introduced, leveraging pre-trained BERT predict the location of within sequences accurately. Leveraging advanced language models tailored amino acid sequence complexity, proposed model enhances prediction accuracy efficiency. The approach benchmarked against existing methodologies shown 0.2965 MCC 0.7291 AUC comprehensive evaluation. results highlight model's superiority high reliability, establishing new standard computational analysis. research propels identification toward potential development therapeutic interventions.

Language: Английский

Genome-Wide Characterization of Wholly Disordered Proteins in Arabidopsis DOI Open Access
William J. Long, Liang Zhao, Huimin Yang

et al.

International Journal of Molecular Sciences, Journal Year: 2025, Volume and Issue: 26(3), P. 1117 - 1117

Published: Jan. 28, 2025

Intrinsically disordered proteins (IDPs) include two types of proteins: partial regions (IDRs) and wholly (WDPs). Extensive studies focused on the with IDRs, but less is known about WDPs because their difficult-to-form folded tertiary structure. In this study, we developed a bioinformatics method for screening more than 50 amino acids in genome level found total 27 categories, including 56 WDPs, Arabidopsis. After comparing randomly selected structural proteins, that possessed wide range theoretical isoelectric point (PI), negative Grand Average Hydropathicity (GRAVY), higher value Instability Index (II), lower values Aliphatic (AI). addition, by calculating FCR (fraction charged residue) NCPR (net charge per each WDP, 20 R1 (FCR < 0.25 0.25) group, 15 R2 (0.25 ≤ 0.35 0.35), 19 R3 > R4 0.35). Moreover, gene expression protein-protein interaction (PPI) network analysis showed perform different biological functions. We also SIS (Salt Induced Serine rich) RAB18 (a dehydrin family protein), undergo vitro liquid-liquid phase separation (LLPS). Therefore, our results provide insight into understanding biochemical characters functions plants.

Language: Английский

Citations

0

Navigating the unstructured by evaluating alphafold’s efficacy in predicting missing residues and structural disorder in proteins DOI Creative Commons

Sen Zheng

PLoS ONE, Journal Year: 2025, Volume and Issue: 20(3), P. e0313812 - e0313812

Published: March 25, 2025

The study investigated regions with undefined structures, known as “missing” segments in X-ray crystallography and cryo-electron microscopy (Cryo-EM) data, by assessing their predicted structural confidence disorder scores. Utilizing a comprehensive dataset from the Protein Data Bank (PDB), residues were categorized “modeled”, “hard missing” “soft based on visibility datasets. Key features determined, including score local distance difference test (pLDDT) AlphaFold2, an advanced prediction tool, IUPred, traditional method. To enhance performance for unstructured residues, we employed Long Short-Term Memory (LSTM) model, integrating both scores amino acid sequences. Notable patterns such composition, region lengths observed identified through experiments over our studied period. Our findings also indicate that often align low scores, whereas exhibit dynamic behavior can complicate predictions. incorporation of pLDDT, IUPred sequence data into LSTM model has improved differentiation between structured particularly shorter regions. This research elucidates relationship established computational predictions experimental enhancing ability to target structurally significant areas guiding designs toward functionally relevant

Language: Английский

Citations

0

Are Most Human-Specific Proteins Encoded by Long Noncoding RNAs? DOI
Yves‐Henri Sanejouand

Journal of Molecular Evolution, Journal Year: 2024, Volume and Issue: 92(4), P. 363 - 370

Published: June 25, 2024

Language: Английский

Citations

1

Navigating the Unstructured by Evaluating AlphaFold's Efficacy in Predicting Missing Residues and Structural Disorder in Proteins DOI Creative Commons

Sen Zheng

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 3, 2024

Abstract This study explored the difference between predicted structure confidence and disorder detection in protein, focusing on regions with undefined structures detected as missing segments X-ray crystallography Cryo-EM data. Recognizing importance of these ‘unstructured’ for protein functionality, we examined alignment numerous sequences their resolved or not structures. The research utilized a comprehensive PDB dataset, classifying residues into ‘modeled’, ‘hard missing’ ‘soft based visibility structural By analysis, key features were firstly determined, including score pLDDT from Al-phaFold2, an advanced AI-based tool, IUPred, conventional prediction method. Our analysis reveals that "hard missing" often reside low-confidence regions, but are exclusively associated predictions. It was assessed how effectively individual can distinguish structured unstructured data, well potential benefits combining machine learning applications. approach aims to uncover varying correlations across different experimental methodologies latest analyzing relationships predictions structures, more identify targets within proteins, guiding designs toward areas functional significance, whether they exhibit high stability crucial regions.

Language: Английский

Citations

0

Are most human specific proteins encoded by long non-coding RNA ? DOI Creative Commons
Yves‐Henri Sanejouand

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 13, 2023

Abstract By looking for a lack of homologues in reference database 27 well-annotated proteomes primates and 52 other mammals, 170 putative human-specific proteins were identified. Among them, only 2 are known at the protein level 23 transcript level, according to Uniprot. Though 21 these 25 found encoded by an open reading frame long non-coding RNA, 60% them predicted be least 90% globular, with single structural domain. However, there is near complete knowledge about proteins, no tridimensional structure presently available Protein Databank fair prediction AlphaFold Structure Database. Moreover, function possibly key remains scarce.

Language: Английский

Citations

0

Enhancing Intrinsically Disordered Region Identification in Proteins: A BERT-Based Deep Learning Approach DOI

Prasanna Kumar B G,

I. R. Oviya,

Fabia U. Battistuzzi

et al.

Published: Dec. 29, 2023

Intrinsically Disordered Regions (IDRs) are pivotal to understanding protein functionality in cellular processes, with significant implications drug discovery and structural biology. These regions recognized for their roles Amino acids Relations, PTMs phase separations. However, traditional experimental methods identifying IDRs time-consuming resource-intensive, while current machine-learning approaches often need improve scalability precision across diverse extensive datasets. In response this challenge, a novel deep learning framework is introduced, leveraging pre-trained BERT predict the location of within sequences accurately. Leveraging advanced language models tailored amino acid sequence complexity, proposed model enhances prediction accuracy efficiency. The approach benchmarked against existing methodologies shown 0.2965 MCC 0.7291 AUC comprehensive evaluation. results highlight model's superiority high reliability, establishing new standard computational analysis. research propels identification toward potential development therapeutic interventions.

Language: Английский

Citations

0