A review of multimodal deep learning methods for genomic-enabled prediction in plant breeding DOI
Osval A. Montesinos‐López,

Moisés Chavira-Flores,

Kiasmiantini

et al.

Genetics, Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 5, 2024

Abstract Deep learning methods have been applied when working to enhance the prediction accuracy of traditional statistical in field plant breeding. Although deep seems be a promising approach for genomic prediction, it has proven some limitations, since its conventional fail leverage all available information. Multimodal aim improve predictive power their unimodal counterparts by introducing several modalities (sources) input In this review, we introduce theoretical basic concepts multimodal and provide list most widely used neural network architectures learning, as well strategies fuse data from different modalities. We mention computational resources practical implementation problems. finally performed review applications selection breeding other related fields. present meta-picture performance highlight how these tools can help address complex problems discussed relevant considerations that researchers should keep mind applying methods. holds significant potential various fields, including selection. While displays enhanced capabilities over machine methods, demands more resources. effectively captures intermodal interactions, especially integrating sources. To apply selection, suitable fusion must chosen. It is like powerful tool but carefully applied. Given edge valuable addressing challenges food security amid growing global population.

Language: Английский

TMO-Net: an explainable pretrained multi-omics model for multi-task learning in oncology DOI Creative Commons

Feng-ao Wang,

Zhenfeng Zhuang,

Feng Gao

et al.

Genome biology, Journal Year: 2024, Volume and Issue: 25(1)

Published: June 6, 2024

Abstract Cancer is a complex disease composing systemic alterations in multiple scales. In this study, we develop the Tumor Multi-Omics pre-trained Network (TMO-Net) that integrates multi-omics pan-cancer datasets for model pre-training, facilitating cross-omics interactions and enabling joint representation learning incomplete omics inference. This enhances sample empowers various downstream oncology tasks with datasets. By employing interpretable learning, characterize contributions of distinct features to clinical outcomes. The TMO-Net serves as versatile framework cross-modal oncology, paving way tumor omics-specific foundation models.

Language: Английский

Citations

9

DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype–phenotype prediction DOI Creative Commons
Pramod Chandrashekar, Sayali Alatkar, Jiebiao Wang

et al.

Genome Medicine, Journal Year: 2023, Volume and Issue: 15(1)

Published: Oct. 31, 2023

Abstract Background Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied phenotype prediction at different scales, but due to black-box nature of learning, integrating modalities interpreting biological challenging. Additionally, partial availability presents a challenge developing predictive models. Method To address challenges, we developed DeepGAMI, an interpretable neural network model improve genotype–phenotype from data. DeepGAMI leverages functional genomic information, such as eQTLs gene regulation, guide connections. it includes auxiliary layer cross-modal imputation allowing latent features missing thus predicting phenotypes single modality. Finally, uses integrated gradient prioritize various phenotypes. Results We several datasets including genotype bulk cell-type expression diseases, electrophysiology mouse neuronal cells. Using cross-validation independent validation, outperformed existing classifying types, clinical even using (e.g., AUC score 0.79 Schizophrenia 0.73 cognitive impairment Alzheimer’s disease). Conclusion demonstrated that improves prioritizes phenotypic networks multiple complex brains diseases. Also, prioritized disease-associated variants, genes, regulatory linked providing novel insights into interpretation mechanisms. is open-source available general use.

Language: Английский

Citations

12

Integrating GWAS and Transcriptome Data through PrediXcan and Multimodal Deep Learning to Reveal Genetic Basis and Novel Drug Repositioning Opportunities for Alzheimer's Disease DOI Open Access

Xuecong Tian,

Ying Su, Sizhe Zhang

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 2, 2025

Abstract Alzheimer’s Disease (AD) is the leading cause of dementia, imposing significant economic and social burdens. Although genome-wide association studies (GWAS) have identified approximately 70 risk loci, functional mechanisms underlying AD remain unclear. In this study, we integrated GWAS summary statistics from Jiang et al. with gene expression data GTEx project using S-PrediXcan method, encompassing 61 brain-related traits across 49 tissues. Comprehensive analysis five traits, including family history AD, highlighted key genes such as APOE, APOC1, TOMM40, which play crucial roles in cholesterol metabolism, immune response, neuroinflammation. Validation ROSMAP dataset confirmed these phenotypes. Furthermore, developed AD-MIF, a novel deep multi-layer information fusion model that integrates multi-omics data, achieving 10-20% improvement AUC performance for predicting AD-related compared to traditional models. Gene enrichment emphasized importance pathways metabolism response pathogenesis AD. Additionally, drug repositioning candidate drugs, Dasatinib Sirolimus, may alleviate progression by reducing neuroinflammation clearing senescent cells. Our findings advance understanding genetic architecture improve predictive models, propose potential therapeutic drugs.

Language: Английский

Citations

0

COSIME: Cooperative multi-view integration and Scalable and Interpretable Model Explainer DOI Open Access

Jerome J. Choi,

Noah Cohen Kalafut,

Tim Gruenloh

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 14, 2025

Single-omics approaches often provide a limited view of complex biological systems, whereas multiomics integration offers more comprehensive understanding by combining diverse data views. However, integrating heterogeneous types and interpreting the intricate relationships between features-both within across different views-remains bottleneck. To address these challenges, we introduce COSIME (Cooperative Multi-view Integration Scalable Interpretable Model Explainer). uses backpropagation Learnable Optimal Transport (LOT) to deep neural networks, enabling learning latent features from multiple views predict disease phenotypes. In addition, incorporates Monte Carlo sampling efficiently estimate Shapley values Shapley-Taylor indices, assessment both feature importance their pairwise interactions-synergistically or antagonistically-in predicting We applied simulated real-world datasets, including single-cell transcriptomics, spatial epigenomics, metabolomics, specifically for Alzheimer's disease-related Our results demonstrate that significantly improves prediction performance while offering enhanced interpretability relationships. For example, identified synergistic interactions microglia astrocyte genes associated with AD are likely be active at edges middle temporal gyrus as indicated locations. Finally, is open-source available general use.

Language: Английский

Citations

0

Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer’s Disease Using Genomic Data DOI Open Access
Magdalena Arnal,

Giorgio Bini,

Anastasia Krithara

et al.

International Journal of Molecular Sciences, Journal Year: 2025, Volume and Issue: 26(5), P. 2085 - 2085

Published: Feb. 27, 2025

Complex diseases pose challenges in prediction due to their multifactorial and polygenic nature. This study employed machine learning (ML) analyze genomic data from the UK Biobank, aiming predict predisposition complex like multiple sclerosis (MS) Alzheimer's disease (AD). We tested logistic regression (LR), ensemble tree methods, deep models for this purpose. LR displayed remarkable stability across various subsets of data, outshining approaches, which showed greater variability performance. Additionally, ML methods demonstrated an ability maintain optimal performance despite correlated features linkage disequilibrium. When comparing risk score (PRS) with PRS consistently performed at average level. By employing explainability tools MS, we found that results confirmed polygenicity disease. The highest-prioritized variants MS were identified as expression or splicing quantitative trait loci located non-coding regions within near genes associated immune response, a prevalence human leukocyte antigen (HLA) gene annotations. Our findings shed light on both potential capture patterns, paving way improved predictive models.

Language: Английский

Citations

0

Lessons from national biobank projects utilizing whole-genome sequencing for population-scale genomics DOI Creative Commons

Hyeji Lee,

Wan Kim,

Nahyeon Kwon

et al.

Genomics & Informatics, Journal Year: 2025, Volume and Issue: 23(1)

Published: March 6, 2025

Large-scale national biobank projects utilizing whole-genome sequencing have emerged as transformative resources for understanding human genetic variation and its relationship to health disease. These initiatives, which include the UK Biobank, All of Us Research Program, Singapore's PRECISE, Biobank Japan, National Project Bio-Big Data Korea, are generating unprecedented volumes high-resolution genomic data integrated with comprehensive phenotypic, environmental, clinical information. This review examines methodologies, contributions, challenges major WGS-based genome worldwide. We first discuss landscape highlighting their distinct approaches collection, participant recruitment, phenotype characterization. then introduce recent technological advances that enable efficient processing analysis large-scale WGS data, including improvements in variant calling algorithms, innovative methods creating multi-sample VCFs, optimized storage formats, cloud-based computing solutions. The synthesizes key discoveries from these projects, particularly identifying expression quantitative trait loci rare variants associated complex diseases. Our introduces latest findings has advanced our population-specific diseases Korean East Asian populations. Finally, we future directions maximizing impact on precision medicine global equity. examination demonstrates how revolutionizing research healthcare delivery while importance continued investment diverse, resources.

Language: Английский

Citations

0

PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies DOI Creative Commons
Xinzhi Yao,

Sizhuo Ouyang,

Yulong Lian

et al.

Genome Medicine, Journal Year: 2024, Volume and Issue: 16(1)

Published: April 16, 2024

Abstract Despite the abundance of genotype-phenotype association studies, resulting outcomes often lack robustness and interpretations. To address these challenges, we introduce PheSeq, a Bayesian deep learning model that enhances interprets studies through integration perception phenotype descriptions. By implementing PheSeq in three case on Alzheimer’s disease, breast cancer, lung identify 1024 priority genes for disease 818 566 cancer respectively. Benefiting from data fusion, findings represent moderate positive rates, high recall interpretation gene-disease studies.

Language: Английский

Citations

2

Phenotype Scoring of Population Scale Single-Cell Data Dissects Alzheimer's Disease Complexity DOI Open Access
Chenfeng He,

Athan Z. Li,

Kalpana Hanthanan Arachchilage

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 2, 2024

Abstract The complexity of Alzheimer’s disease (AD) manifests in diverse clinical phenotypes, including cognitive impairment and neuropsychiatric symptoms (NPSs). However, the etiology these phenotypes remains elusive. To address this, PsychAD project generated a population-level single-nucleus RNA-seq dataset comprising over 6 million nuclei from prefrontal cortex 1,494 individual brains, covering variety AD-related that capture impairment, severity pathological lesions, presence NPSs. Leveraging this dataset, we developed deep learning framework, called Phenotype Associated Single Cell encoder (PASCode), to score single-cell phenotype associations, identified ∼1.5 associate cells (PACs). We compared PACs within 27 distinct brain cell subclasses prioritized subpopulations their expressed genes across various AD upregulation reactive astrocyte subtype with neuroprotective function resilient donors. Additionally, link multiple subpopulation protoplasmic astrocytes alter gene expression regulation donors depression. Uncovering cellular molecular mechanisms underlying has potential provide valuable insights towards identification novel diagnostic markers therapeutic targets. All PACs, along type information, are summarized into an AD-phenotypic atlas for research community.

Language: Английский

Citations

1

Machine learning methods applied to classify complex diseases using genomic data DOI Open Access
Magdalena Arnal,

Giorgio Bini,

Anastasia Krithara

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 20, 2024

ABSTRACT Complex diseases pose challenges in disease prediction due to their multifactorial and polygenic nature. In this work, we explored the of two complex diseases, multiple sclerosis (MS) Alzheimer’s (AD), using machine learning (ML) methods genomic data from UK Biobank. Different ML were applied, including logistic regressions (LR), gradient boosting decision trees (GB), extremely randomized (ET), random forest (RF), feedforward networks (FFN), convolutional neural (CNN). The primary goal research was investigate variability models classifying based on risk. LR most robust method across folds whereas deep (FFN CNN) exhibited high variability. When comparing performance risk scores (PRS) with methods, PRS consistently performed at an average level. However, still offers several practical advantages over methods. Despite implementing feature selection techniques exclude non-informative correlated predictors, did not improve significantly, underscoring ability achieve optimal even presence features linkage disequilibrium. Upon applying explainability tools extract information about contributing classification task, results confirmed polygenicity MS. prevalence HLA gene annotations among top chromosome 6 aligns significance context Overall, highest-prioritized variants identified as expression or splicing quantitative trait loci (eQTL sQTL) located non-coding regions within near genes associated immune response summary, deeper insights into how discern patterns related diseases.

Language: Английский

Citations

0

Enhancing schizophrenia phenotype prediction from genotype data through knowledge-driven deep neural network models DOI Creative Commons
Daniel Martins, Maryam Abbasi, Conceição Egas

et al.

Genomics, Journal Year: 2024, Volume and Issue: 116(5), P. 110910 - 110910

Published: Aug. 5, 2024

This article explores deep learning model design, drawing inspiration from the omnigenic and genetic heterogeneity concepts, to improve schizophrenia prediction using genotype data. It introduces an innovative three-step approach leveraging neural networks' capabilities efficiently handle interactions. A locally connected network initially routes input data variants their corresponding genes. The second step employs Encoder-Decoder capture relationships among identified final integrates knowledge first two incorporates a parallel component consider effects of additional expansion enhances scores by considering larger number Trained models achieved average AUC 0.83, surpassing other genotype-trained matching gene expression dataset-based approaches. Additionally, tests on held-out sets reported sensitivity 0.72 accuracy 0.76, aligning with heritability predictions. Moreover, study addresses challenges diverse population subsets.

Language: Английский

Citations

0