GOBoost: Leveraging Long-Tail Gene Ontology Terms for Accurate Protein Function Prediction DOI Creative Commons
Lei Zhang, Yang Wang, Xiaohong Chen

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 18, 2024

Abstract Motivation With the advancement of deep learning, researchers have increasingly proposed computational methods based on learning techniques to predict protein function. However, many these treat function prediction as a multi-label classification problem, often overlooking long-tail distribution functional labels (i.e., Gene Ontology Terms) in datasets. To address this issue, we propose GOBoost method, which incorporates optimization ensemble strategy. Besides, introduces global-local label graph module and multi-granularity focal loss enhance information, mitigate phenomenon, improve overall accuracy. Results We evaluate other state-of-the-art (SOTA) PDB AF2 The outperformed SOTA across all evaluation metrics both Notably, AUPR test set, improved by 10.71%, 35.91%, 22.71% compared HEAL method MF, BP, CC functions. experimental results demonstrate necessity superiority designing models from perspective. Availability https://github.com/Cao-Labs/GOBoost Contact [email protected]

Language: Английский

Characterization of LBD Genes in Cymbidium ensifolium with Roles in Floral Development and Fragrance DOI Creative Commons

Yukun Peng,

Suying Zhan,

Feng Tang

et al.

Horticulturae, Journal Year: 2025, Volume and Issue: 11(2), P. 117 - 117

Published: Jan. 22, 2025

LBD transcription factors are critical regulators of plant growth and development. Recent studies highlighted their significant role in the transcriptional regulation metabolism. Thus, identifying CeLBD gene Cymbidium ensifolium, a species abundant floral scent metabolites, could provide deeper insights into its functional significance. A total 34 genes were identified C. ensifolium. These CeLBDs fell two major groups: Class I II. The group contained 30 genes, while II included only 4 genes. Among several Ie branch exhibited structural variations or partial deletions (CeLBD20 CeLBD21) coiled-coil motif (LX6LX3LX6L). changes may contribute to difficulty root hair formation prevent normal transcription, leading low absent expression, which explain fleshy corona-like system ensifolium without prominent lateral roots. expansion for was largely due special WGD events orchids during evolution, by segmental duplication tandem duplication. different branches exhibit similar functions expression characteristics. Promoter analysis enriched environmental response elements, such as AP2/ERF, potentially mediating specific under stresses. predicted interact with multiple ribosomal proteins, forming complex regulatory networks. CeLBD20 localized cytoplasm, it act signaling factor activate other factors. CeLBD6 significantly up-regulated cold, drought, ABA treatments, suggesting responses. Furthermore, metabolic correlation revealed that associated release aromatic compounds, MeJA. findings offer valuable further

Language: Английский

Citations

0

Evaluating Sequence and Structural Similarity Metrics for Predicting Shared Paralog Functions DOI Creative Commons
Olivier Dennler, Colm J. Ryan

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 13, 2024

ABSTRACT Gene duplication is the primary source of new genes, resulting in most genes having identifiable paralogs. Over evolutionary time scales, paralog pairs may diverge some respects but many retain ability to perform same functional role. Protein sequence identity often used as a proxy for similarity and can predict shared functions between paralogs revealed by synthetic lethal experiments. However, advent alternative protein representations, including embeddings from language models (PLMs) predicted structures AlphaFold, raises possibility that metrics could better capture Here, using two species (budding yeast human) different definitions functionality (shared protein-protein interactions, lethality) we evaluated variety metrics. For tasks, structural or PLM embedding outperform identity, more importantly these are not redundant with i.e. combining them leads improved predictions functionality. By adding contextual features, representing homologous proteins within across species, significantly enhance our Overall, results suggest complementary aspects beyond alone. GRAPHICAL

Language: Английский

Citations

0

GOBoost: Leveraging Long-Tail Gene Ontology Terms for Accurate Protein Function Prediction DOI Creative Commons
Lei Zhang, Yang Wang, Xiaohong Chen

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 18, 2024

Abstract Motivation With the advancement of deep learning, researchers have increasingly proposed computational methods based on learning techniques to predict protein function. However, many these treat function prediction as a multi-label classification problem, often overlooking long-tail distribution functional labels (i.e., Gene Ontology Terms) in datasets. To address this issue, we propose GOBoost method, which incorporates optimization ensemble strategy. Besides, introduces global-local label graph module and multi-granularity focal loss enhance information, mitigate phenomenon, improve overall accuracy. Results We evaluate other state-of-the-art (SOTA) PDB AF2 The outperformed SOTA across all evaluation metrics both Notably, AUPR test set, improved by 10.71%, 35.91%, 22.71% compared HEAL method MF, BP, CC functions. experimental results demonstrate necessity superiority designing models from perspective. Availability https://github.com/Cao-Labs/GOBoost Contact [email protected]

Language: Английский

Citations

0