Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences DOI Creative Commons
Sushmita Basu, Jing Yu, Daisuke Kihara

и другие.

Briefings in Bioinformatics, Год журнала: 2024, Номер 26(1)

Опубликована: Ноя. 22, 2024

Abstract Computational prediction of nucleic acid-binding residues in protein sequences is an active field research, with over 80 methods that were released the past 2 decades. We identify and discuss 87 sequence-based predictors include dozens recently published are surveyed for first time. overview historical progress examine multiple practical issues availability impact predictors, key features their predictive models, important aspects related to training assessment. observe decade has brought increased use deep neural networks language which contributed substantial gains performance. also highlight advancements vital challenging cross-predictions between deoxyribonucleic acid (DNA)-binding ribonucleic (RNA)-binding targeting two distinct sources binding annotations, structure-based versus intrinsic disorder-based. The trained on structure-annotated interactions tend perform poorly disorder-annotated vice versa, only a few target well across both annotation types. significant problem, some DNA-binding or RNA-binding indiscriminately predicting Moreover, we show web servers cited substantially more than tools without implementation no longer working implementations, motivating development long-term maintenance servers. close by discussing future research directions aim drive further this area.

Язык: Английский

A specific GAGTT insertion/deletion variation in the IL-10 gene promoter alters the disease resistance of grass carp DOI

Hong Yang,

Jiaojiao Fu,

Mengyuan Zhang

и другие.

Aquaculture, Год журнала: 2024, Номер 595, С. 741505 - 741505

Опубликована: Авг. 20, 2024

Язык: Английский

Процитировано

2

Cell wall-resident PIR proteins show an inverted architecture in Neurospora crassa, but keep their role as wall stabilizers DOI
Paul Montaño‐Silva, Olga A. Callejas‐Negrete, Alejandro Pereira‐Santana

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Июль 19, 2024

ABSTRACT Proteins with internal repeats (PIRs) are the second most abundant class of fungal cell wall resident proteins. In yeasts, PIRs preserve stability under stressful conditions. They characterized by conserved N-terminal amino acid sequences repeated in tandem (PIR domains), and a Cys-rich C-terminal domain. Despite have been inferred several filamentous fungi genomes, they not studied beyond yeasts. this work, diversity, evolution biological role, focused on new class, were addressed. Bioinformatic inference indicated an innovation Ascomycota. Predicted clustered two main groups: classical yeasts (N-terminal PIR domains; domain), from inverted architecture domain; which could harbor additional GPI-signals. As representatives group, Neurospora crassa (Nc) PIR-1 (NCU04033) PIR-2 (NCU07569) studied. Confocal microscopy eGFP-labeled revealed accumulate apical plugs; additionally, requires Kex2 processing site for correct maturation, its predicted GPI modification signal resulted functional. Moreover, Nc Δ pir-1 pir-2 single mutants showed growth rate similar to that WT, but double mutant /Δ grew significatively slower. Similarly, mildly sensitive calcofluor white, although was severely impaired. PIR-2, stabilizers as yeast PIRs.

Язык: Английский

Процитировано

1

Systematic discovery of DNA-binding tandem repeat proteins DOI Creative Commons
Xiaoxuan Hu, X.J. Zhang, Wen Sun

и другие.

Nucleic Acids Research, Год журнала: 2024, Номер 52(17), С. 10464 - 10489

Опубликована: Авг. 27, 2024

Abstract Tandem repeat proteins (TRPs) are widely distributed and bind to a wide variety of ligands. DNA-binding TRPs such as zinc finger (ZNF) transcription activator-like effector (TALE) play important roles in biology biotechnology. In this study, we first conducted an extensive analysis public databases, found that the enormous diversity is largely unexplored. We then focused our efforts on identifying novel possessing capabilities. established protein language model for prediction (PLM-DBPPred), predicted large number TRPs. A subset was selected experimental screening, leading identification 11 TRPs, with six showing sequence specificity. Notably, members STAR (Short TALE-like Repeat proteins) family can be programmed target specific 9 bp DNA sequences high affinity. Leveraging property, generated artificial factors using reprogrammed achieved targeted activation endogenous gene sets. Furthermore, families MOON (Marine Organism-Originated binding protein) pTERF (prokaryotic mTERF-like exhibit unique features distinct characteristics, revealing interesting biological clues. Our study expands demonstrates systematic approach greatly enhances discovery new insights tools.

Язык: Английский

Процитировано

1

Exploring protein natural diversity in environmental microbiomes with DeepMetagenome DOI Creative Commons
Xiaofang Li, Jun Zhang, Dan Ma

и другие.

Cell Reports Methods, Год журнала: 2024, Номер 4(11), С. 100896 - 100896

Опубликована: Ноя. 1, 2024

MotivationExploring protein diversity is key to understanding function and advancing engineering. Environmental DNA contains vast sequence space, going beyond current databases. Harnessing these sequences requires approaches for the targeted annotation of specific functions. Here, we present DeepMetagenome, a deep learning-based procedure, which not only facilitates identification typical family but also enables discovery within under-annotated families in existing databases.Highlights•DeepMetagenome method annotating from (meta)genomes•DeepMetagenome outperformed alignment-based machine learning methods•Predicted metallothionein genes were experimentally verified their function•DeepMetagenome can be easily repurposed mining other proteinsSummaryProtein natural offers space engineering, its detection metagenomes/proteomes without prior assumptions. Python-based method, explores through modules training analyzing datasets. The model includes Embedding, Conv1D, LSTM, Dense layers, with feature analysis data cleaning. Applied metallothioneins database over 146 million coding features, DeepMetagenome identified 500 high-confidence sequences, outperforming DIAMOND CNN-based models. It showed stable performance compared Transformer-based 25 epochs. Among 23 synthesized 20 exhibited metal resistance. tool successfully explored three additional freely available on GitHub detailed instructions.Graphical abstract

Язык: Английский

Процитировано

0

Short tandem repeats delineate gene bodies across eukaryotes DOI Creative Commons
William B. Reinar, Anders K. Krabberød, Vilde Olsson Lalun

и другие.

Nature Communications, Год журнала: 2024, Номер 15(1)

Опубликована: Дек. 30, 2024

Short tandem repeats (STRs) have emerged as important and hypermutable sites where genetic variation correlates with gene expression in plant animal systems. Recently, it has been shown that a broad range of transcription factors (TFs) are affected by STRs near or the DNA target binding site. Despite this, distribution STR motif repetitiveness eukaryote genomes is still largely unknown. Here, we identify monomer dimer 5.1 billion 10-bp windows upstream translation starts downstream stops 25 million genes spanning 1270 species across eukaryotic Tree Life. We report all surveyed gene-proximal shifts repetitiveness. Within genomes, landscapes correlated to function genes; housekeeping functions were depleted Furthermore, TF sites, indicating evolved conjunction cis-regulatory TFs recognize repetitive sites. These results suggest hypermutability inherent canalized along genome sequence contributes regulatory eco-evolutionary dynamics eukaryotes.

Язык: Английский

Процитировано

0

Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences DOI Creative Commons
Sushmita Basu, Jing Yu, Daisuke Kihara

и другие.

Briefings in Bioinformatics, Год журнала: 2024, Номер 26(1)

Опубликована: Ноя. 22, 2024

Abstract Computational prediction of nucleic acid-binding residues in protein sequences is an active field research, with over 80 methods that were released the past 2 decades. We identify and discuss 87 sequence-based predictors include dozens recently published are surveyed for first time. overview historical progress examine multiple practical issues availability impact predictors, key features their predictive models, important aspects related to training assessment. observe decade has brought increased use deep neural networks language which contributed substantial gains performance. also highlight advancements vital challenging cross-predictions between deoxyribonucleic acid (DNA)-binding ribonucleic (RNA)-binding targeting two distinct sources binding annotations, structure-based versus intrinsic disorder-based. The trained on structure-annotated interactions tend perform poorly disorder-annotated vice versa, only a few target well across both annotation types. significant problem, some DNA-binding or RNA-binding indiscriminately predicting Moreover, we show web servers cited substantially more than tools without implementation no longer working implementations, motivating development long-term maintenance servers. close by discussing future research directions aim drive further this area.

Язык: Английский

Процитировано

0