GOPhage: protein function annotation for bacteriophages by integrating the genomic context DOI Creative Commons
Jiaojiao Guan, Jiayu Shang, Cheng Peng

et al.

Briefings in Bioinformatics, Journal Year: 2024, Volume and Issue: 26(1)

Published: Nov. 22, 2024

Bacteriophages are viruses that target bacteria, playing a crucial role in microbial ecology. Phage proteins important understanding phage biology, such as virus infection, replication, and evolution. Although large number of new phages have been identified via metagenomic sequencing, many them limited protein function annotation. Accurate annotation presents several challenges, including their inherent diversity the scarcity annotated ones. Existing tools yet to fully leverage unique properties annotating functions. In this work, we propose tool for by leveraging modular genomic structure genomes. By employing embeddings from latest foundation models Transformer capture contextual information between genomes, GOPhage surpasses state-of-the-art methods diverged with uncommon functions 6.78% 13.05% improvement, respectively. can annotate lacking homology search results, which is critical characterizing rapidly accumulating We demonstrate utility identifying 688 potential holins phages, exhibit high structural conservation known holins. The results show extend our newly discovered phages.

Language: Английский

CataLM: empowering catalyst design through large language models DOI Creative Commons
Ludi Wang, Xueqing Chen, Yi Du

et al.

International Journal of Machine Learning and Cybernetics, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 15, 2025

The field of catalysis holds paramount importance in shaping the trajectory sustainable development, prompting intensive research efforts to leverage artificial intelligence (AI) catalyst design. Presently, fine-tuning open-source large language models (LLMs) has yielded significant breakthroughs across various domains such as biology and healthcare. Drawing inspiration from these advancements, we introduce CataLM (Catalytic Language Model), a model tailored domain electrocatalytic materials. Our findings demonstrate that exhibits remarkable potential for facilitating human-AI collaboration knowledge exploration To best our knowledge, stands pioneering LLM dedicated domain, offering novel avenues discovery development.

Language: Английский

Citations

1

Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review DOI Creative Commons
Jiaying Chen, Jingfu Wang, Yue Hu

et al.

Frontiers in Bioengineering and Biotechnology, Journal Year: 2025, Volume and Issue: 13

Published: Jan. 21, 2025

Protein function prediction is crucial in several key areas such as bioinformatics and drug design. With the rapid progress of deep learning technology, applying protein language models has become a research focus. These utilize increasing amount large-scale sequence data to deeply mine its intrinsic semantic information, which can effectively improve accuracy prediction. This review comprehensively combines current status latest It provides an exhaustive performance comparison with traditional methods. Through in-depth analysis experimental results, significant advantages enhancing depth tasks are fully demonstrated.

Language: Английский

Citations

1

Accelerating drug discovery, development, and clinical trials by artificial intelligence DOI
Yilun Zhang, Mohamed Mastouri, Yang Zhang

et al.

Med, Journal Year: 2024, Volume and Issue: 5(9), P. 1050 - 1070

Published: Aug. 23, 2024

Language: Английский

Citations

4

TarIKGC: A Target Identification Tool Using Semantics-Enhanced Knowledge Graph Completion with Application to CDK2 Inhibitor Discovery DOI

Shen Xiao-juan,

Shijia Yan,

Tao Zeng

et al.

Journal of Medicinal Chemistry, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 2, 2025

Target identification is a critical stage in the drug discovery pipeline. Various computational methodologies have been dedicated to enhancing classification performance of compound-target interactions, yet significant room remains for improving recommendation performance. To address this challenge, we developed TarIKGC, tool target prioritization that leverages semantics enhanced knowledge graph (KG) completion. This method harnesses representation learning within heterogeneous compound-target-disease network. Specifically, TarIKGC combines an attention-based aggregation neural network with multimodal feature extractor simultaneously learn internal semantic features from biomedical entities and topological KG. Furthermore, KG embedding model employed identify missing relationships among compounds targets. In silico evaluations highlighted superior repositioning tasks. addition, successfully identified two potential cyclin-dependent kinase 2 (CDK2) inhibitors novel scaffolds through reverse fishing. Both exhibited antiproliferative activities across multiple therapeutic indications targeting CDK2.

Language: Английский

Citations

0

Enzyme functional classification using artificial intelligence DOI

Hyung Kyu Kim,

H. Ji,

Gi Bae Kim

et al.

Trends in biotechnology, Journal Year: 2025, Volume and Issue: unknown

Published: March 1, 2025

Language: Английский

Citations

0

How did we get there? AI applications to biological networks and sequences DOI Creative Commons
Marco Anteghini, Francesco Gualdi,

Baldo Oliva

et al.

Computers in Biology and Medicine, Journal Year: 2025, Volume and Issue: 190, P. 110064 - 110064

Published: April 5, 2025

The rapidly advancing field of artificial intelligence (AI) has transformed numerous scientific domains, including biology, where a vast and complex volume data is available for analysis. This paper provides comprehensive overview the current state AI-driven methodologies in genomics, proteomics, systems biology. We discuss how machine learning algorithms, particularly deep models, have enhanced accuracy efficiency embedding sequences, motif discovery, prediction gene expression protein structure. Additionally, we explore integration AI analysis biological networks, protein-protein interaction networks multi-layered networks. By leveraging large-scale data, techniques enabled unprecedented insights into processes disease mechanisms. work underlines potential applying to highlighting applications suggesting directions future research further this evolving field.

Language: Английский

Citations

0

Aligning large language models and geometric deep models for protein representation DOI Creative Commons
Dong Wook Shu,

Bingbing Duan,

Kai Guo

et al.

Patterns, Journal Year: 2025, Volume and Issue: unknown, P. 101227 - 101227

Published: April 1, 2025

Citations

0

A large-scale assessment of sequence database search tools for homology-based protein function prediction DOI Creative Commons
Chengxin Zhang, Lydia Freddolino

Briefings in Bioinformatics, Journal Year: 2024, Volume and Issue: 25(4)

Published: May 23, 2024

Abstract Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These are also a critical component in state-of-the-art machine learning deep learning-based predictors. Although sequence search tools basis prediction, previous studies have scarcely explored how to select optimal configure their parameters achieve best prediction. In this paper, we evaluate effect using different options from among tools, well impacts parameters, on When GO terms large benchmark dataset, found that BLASTp MMseqs2 consistently exceed performance other including DIAMOND—one prediction—under default parameters. However, with correct parameter settings, DIAMOND can perform comparably Additionally, developed new scoring derive prediction homologous hits outperform previously proposed functions. findings enable improvement almost all algorithms few easily implementable changes homolog-based component. This study emphasizes role settings should an important contribution development future algorithms.

Language: Английский

Citations

3

PANDA-3D: protein function prediction based on AlphaFold models DOI Creative Commons
Chuanling Zhao, Tong Liu, Zheng Wang

et al.

NAR Genomics and Bioinformatics, Journal Year: 2024, Volume and Issue: 6(3)

Published: July 2, 2024

Previous protein function predictors primarily make predictions from amino acid sequences instead of tertiary structures because the limited number experimentally determined and unsatisfying qualities predicted structures. AlphaFold recently achieved promising performances when predicting structures, structure database (AlphaFold DB) is fast-expanding. Therefore, we aimed to develop a deep-learning tool that specifically trained with models predict GO terms models. We developed an advanced learning architecture by combining geometric vector perceptron graph neural networks variant transformer decoder layers for multi-label classification. PANDA-3D predicts gene ontology (GO) embeddings based on large language model. Our method significantly outperformed state-of-the-art was either or comparable several other language-model-based methods as input. tailored models, DB currently contains over 200 million (as May 1st, 2023), making useful can accurately annotate functions proteins. be freely accessed web server http://dna.cs.miami.edu/PANDA-3D/ repository https://github.com/zwang-bioinformatics/PANDA-3D.

Language: Английский

Citations

3

Clinical decision making: Evolving from the hypothetico‐deductive model to knowledge‐enhanced machine learning DOI Creative Commons
Han Yuan

Medicine Advances, Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 16, 2024

Knowledge-enhanced machine learning can be conceptualized as a fusion of clinical knowledge and domain expertise extracted from traditional decision making methods alongside powerful architectures. significantly improves current in terms interpretability, generalizability, accuracy, equity. Clinical (CDM) is process that healthcare professionals undertake when assessments about patients' conditions decisions the care to provide [1, 2]. Traditional CDM founded on either unconscious intuition or conscious inference frameworks with well-defined logic [3]. Intuition, defined understanding without rationale, integrates tacit pertinent experience developed over years practice automate cognitive processing devoid formalized rules [4, 5]. However, its nature obscures precise identification initiating cues logic, limiting application [6]. Distinct intuition, possess execution steps, predominantly encompassing hypothetico-deductive model (HDM) [7] pattern recognition (PRM) [8]. The HDM involves four indispensable steps: cue acquisition, hypothesis generation, interpretation, evaluation [7]. Initially, acquisition systematically collects patient medical information per requirement by clinicians. Subsequently, multiple preliminary hypotheses are derived retrieved information, enabling clinicians use an established theory based gathered propose according their [9]. This followed which discern initial accordingly refine these [10]. culminates evaluation, whereby corroborated refuted amassed evidence. Should all rejected, another round will commence. Ng et al. present real-world task [11]. When diagnosing patients acute chest pain, gather related cardiovascular risk factors, smoking history, recent viral infections, other relevant information. Based this they generate diagnostic hypotheses, such coronary syndrome, myocarditis, pericarditis, pneumonia. They interpret evaluate refining ruling out possibilities through further investigations, electrocardiograms, complete blood counts, radiographs. In contrast analytical HDM, PRM employs nonanalytical matching new cases similar patterns stored memory, specifically for encountered previously documented within guidelines [8, 10]. routine encounters, outpaces at exceptional rate. Notably, instances ambiguity, approach retains superiority more effective solution Although widely implemented some them, including have been recognized gold standards [12], (ML), demonstrated Figure 1, increasingly adopted handle unprecedented volume generated advanced instruments electronically recorded systems [13]. abundance data poses challenges relying manual effort, but ML techniques hold promise because enable computers automatically learn projection functions between raw targets interest explicit instructions human experts [14]. For example, support vector machine, random forest, k-nearest neighbor used diagnose Alzheimer's disease [15], breast cancer [16], Parkinson's [17], respectively. addition conventional techniques, deep learning, specialized subset focusing design training strategies artificial neural networks, has emerged state-of-the-art various tasks owing extensive parameterization intricate capability Comparison method (hypothetico-deductive model) versus contemporary (random forest) scholarly publications last 25 years. Specific numbers were systematic inquiry Google Scholar employing search "clinical making" conjunction "hypothetico-deductive model" "random forest" August 21, 2024. Though purely data-driven superior accuracy [18], exhibits drawbacks interpretability complex architectures [19]. Interpretability stands pivotal characteristic rectify potential erroneous endanger lives. To address challenge augment capabilities ML, researchers collaborate incorporate into methodologies [20]. As depicted 2, integration, termed knowledge-enhanced (KEML) [21], architectures, both models approaches [22]. Schematic plot depicting classic leading toward future making. foremost advantage KEML ability improve [23]. explainable intelligence proposed supplement explanations frequently suffer logical inconsistencies stemming noise datasets limited applicability particular cohorts [24]. instance, previous study showed classifiers pneumothorax often rely irrelevant regions beyond lesion area diagnosis, resulting overfitting specific sources Integrating occurrence enhance generalization classifiers. Another illustrative example knowledge-guided interpretable prediction method, showcases graphs modeling personalized improving extracting crucial graph paths prompts ChatGPT clinician-comprehensible natural language addition, leverages external further. Dynamic gated recurrent network exemplifies enrichment representation event additional adjacent events [25]. also alleviates biases involvement expertise. Chen summarized mitigate disparity inequity each stage life cycle [26]. Hence, integration not only amplifies mitigates biases, ultimately advancing deployment ML-driven solutions [27]. commentary, we evolution elucidated underlying rationale driving paradigm shift, emphasizing imperative adapting era big data. While embracing represents advancement embodied should disregarded. Therefore, advocate KEML, novel capitalizing strengths methodologies, propel level fairness [28]. Han Yuan: Conceptualization (lead); curation formal analysis investigation methodology visualization writing—original draft writing—review & editing (lead). None. author declares no conflicts interest. Not applicable. Data sharing applicable article was analyzed during study.

Language: Английский

Citations

3