scCompass: An integrated cross-species scRNA-seq database for AI-ready DOI Open Access
Pengfei Wang, Wenhao Liu, Jiajia Wang

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 15, 2024

Abstract Emerging single-cell sequencing technology has generated large amounts of data, allowing analysis cellular dynamics and gene regulation at the resolution. Advances in artificial intelligence enhance life sciences research by delivering critical insights optimizing data processes. However, inconsistent processing quality standards remain to be a major challenge. Here we propose scCompass, which provides solution build large-scale, cross-species model-friendly collection. By applying standardized pre-processing, scCompass integrates curates transcriptomic from 13 species nearly 105 million single cells. Using this extensive dataset, are able archieve stable expression genes (SEGs) organ-specific (OSGs) human mouse. We provide different scalable datasets that can easily adapted for AI model training pretrained checkpoints with state-of-the-art (SOTA) foundataion models. In summary, AI-readiness combined user-friendly sharing, visualization online analysis, greatly simplifies access exploitation researchers cell biology( http://www.bdbe.cn/kun ).

Language: Английский

COSIME: Cooperative multi-view integration and Scalable and Interpretable Model Explainer DOI Open Access

Jerome J. Choi,

Noah Cohen Kalafut,

Tim Gruenloh

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 14, 2025

Single-omics approaches often provide a limited view of complex biological systems, whereas multiomics integration offers more comprehensive understanding by combining diverse data views. However, integrating heterogeneous types and interpreting the intricate relationships between features-both within across different views-remains bottleneck. To address these challenges, we introduce COSIME (Cooperative Multi-view Integration Scalable Interpretable Model Explainer). uses backpropagation Learnable Optimal Transport (LOT) to deep neural networks, enabling learning latent features from multiple views predict disease phenotypes. In addition, incorporates Monte Carlo sampling efficiently estimate Shapley values Shapley-Taylor indices, assessment both feature importance their pairwise interactions-synergistically or antagonistically-in predicting We applied simulated real-world datasets, including single-cell transcriptomics, spatial epigenomics, metabolomics, specifically for Alzheimer's disease-related Our results demonstrate that significantly improves prediction performance while offering enhanced interpretability relationships. For example, identified synergistic interactions microglia astrocyte genes associated with AD are likely be active at edges middle temporal gyrus as indicated locations. Finally, is open-source available general use.

Language: Английский

Citations

0

scCompass: An integrated cross-species scRNA-seq database for AI-ready DOI Open Access
Pengfei Wang, Wenhao Liu, Jiajia Wang

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 15, 2024

Abstract Emerging single-cell sequencing technology has generated large amounts of data, allowing analysis cellular dynamics and gene regulation at the resolution. Advances in artificial intelligence enhance life sciences research by delivering critical insights optimizing data processes. However, inconsistent processing quality standards remain to be a major challenge. Here we propose scCompass, which provides solution build large-scale, cross-species model-friendly collection. By applying standardized pre-processing, scCompass integrates curates transcriptomic from 13 species nearly 105 million single cells. Using this extensive dataset, are able archieve stable expression genes (SEGs) organ-specific (OSGs) human mouse. We provide different scalable datasets that can easily adapted for AI model training pretrained checkpoints with state-of-the-art (SOTA) foundataion models. In summary, AI-readiness combined user-friendly sharing, visualization online analysis, greatly simplifies access exploitation researchers cell biology( http://www.bdbe.cn/kun ).

Language: Английский

Citations

0