High order expression dependencies finely resolve cryptic states and subtypes in single cell data DOI Creative Commons
Abel Jansma, Yuelin Yao, Jareth C. Wolfe

et al.

Molecular Systems Biology, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 2, 2025

Language: Английский

Best practices for single-cell analysis across modalities DOI Open Access
Lukas Heumos, Anna C. Schaar, Christopher Lance

et al.

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 24(8), P. 550 - 572

Published: March 31, 2023

Language: Английский

Citations

513

Dissection of artifactual and confounding glial signatures by single-cell sequencing of mouse and human brain DOI Creative Commons
Samuel E. Marsh, Alec J. Walker, Tushar Kamath

et al.

Nature Neuroscience, Journal Year: 2022, Volume and Issue: 25(3), P. 306 - 316

Published: March 1, 2022

Language: Английский

Citations

268

The specious art of single-cell genomics DOI Creative Commons
Tara Chari, Lior Pachter

PLoS Computational Biology, Journal Year: 2023, Volume and Issue: 19(8), P. e1011288 - e1011288

Published: Aug. 17, 2023

Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with to 2 or 3 dimensions produce "all-in-one" visuals of the that are amenable human eye, these subsequently used qualitative quantitative exploratory analysis. However, there little theoretical support this practice, we show extreme dimension reduction, from hundreds thousands 2, inevitably induces significant distortion high-dimensional datasets. We therefore examine practical implications low-dimensional embedding find extensive distortions inconsistent practices make such embeddings counter-productive exploratory, biological lieu this, discuss alternative approaches conducting targeted feature exploration enable hypothesis-driven discovery.

Language: Английский

Citations

180

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated DOI Creative Commons
Eran Elhaik

Scientific Reports, Journal Year: 2022, Volume and Issue: 12(1)

Published: Aug. 29, 2022

Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only minimal loss information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as foremost analyses population genetics related fields (e.g., animal plant or medical genetics). outcomes to shape study design, identify, characterize individuals populations, draw historical ethnobiological conclusions origins, evolution, dispersion, relatedness. replicability crisis science has prompted us evaluate whether results reliable, robust, replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human data. demonstrate artifacts easily manipulated generate desired outcomes. adjustment also yielded unfavorable association studies. may not replicable field assumes. Our findings raise concerns about validity reported literature place disproportionate reliance upon insights derived from them. conclude have biasing role genetic investigations 32,000-216,000 studies should reevaluated. An alternative mixed-admixture discussed.

Language: Английский

Citations

139

RNA velocity unraveled DOI Creative Commons
Gennady Gorin, Meichen Fang, Tara Chari

et al.

PLoS Computational Biology, Journal Year: 2022, Volume and Issue: 18(9), P. e1010492 - e1010492

Published: Sept. 12, 2022

We perform a thorough analysis of RNA velocity methods, with view towards understanding the suitability various assumptions underlying popular implementations. In addition to providing self-contained exposition mathematics, we undertake simulations and controlled experiments on biological datasets assess workflow sensitivity parameter choices biology. Finally, argue for more rigorous approach velocity, present framework Markovian that points directions improvement mitigation current problems.

Language: Английский

Citations

117

Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments DOI Open Access
Laurent Gatto, Ruedi Aebersold, Jüergen Cox

et al.

Nature Methods, Journal Year: 2023, Volume and Issue: 20(3), P. 375 - 386

Published: March 1, 2023

Language: Английский

Citations

104

A single-cell time-lapse of mouse prenatal development from gastrula to birth DOI Creative Commons
Chengxiang Qiu, Beth Martin, Ian Welsh

et al.

Nature, Journal Year: 2024, Volume and Issue: 626(8001), P. 1084 - 1093

Published: Feb. 14, 2024

Abstract The house mouse ( Mus musculus ) is an exceptional model system, combining genetic tractability with close evolutionary affinity to humans 1,2 . Mouse gestation lasts only 3 weeks, during which the genome orchestrates astonishing transformation of a single-cell zygote into free-living pup composed more than 500 million cells. Here, establish global framework for exploring mammalian development, we applied optimized combinatorial indexing profile transcriptional states 12.4 nuclei from 83 embryos, precisely staged at 2- 6-hour intervals spanning late gastrulation (embryonic day 8) birth (postnatal 0). From these data, annotate hundreds cell types and explore ontogenesis posterior embryo somitogenesis kidney, mesenchyme, retina early neurons. We leverage temporal resolution sampling depth whole-embryo snapshots, together published data 4–8 earlier timepoints, construct rooted tree cell-type relationships that spans entirety prenatal birth. Throughout this tree, systematically nominate genes encoding transcription factors other proteins as candidate drivers in vivo differentiation types. Remarkably, most marked shifts are observed within one hour presumably underlie massive physiological adaptations must accompany successful transition fetus life outside womb.

Language: Английский

Citations

58

The benefits and pitfalls of machine learning for biomarker discovery DOI Creative Commons
Sandra Ng,

Sara Masarone,

David Watson

et al.

Cell and Tissue Research, Journal Year: 2023, Volume and Issue: 394(1), P. 17 - 31

Published: July 27, 2023

Prospects for the discovery of robust and reproducible biomarkers have improved considerably with development sensitive omics platforms that can enable measurement biological molecules at an unprecedented scale. With technical barriers to success lowering, challenge is now moving into analytical domain. Genome-wide presents a problem scale multiple testing as standard statistical methods struggle distinguish signal from noise in increasingly complex systems. Machine learning AI are good finding answers large datasets, but they tendency overfit solutions. It may be possible find local answer or mechanism specific patient sample small group samples, this not generalise wider populations due high likelihood false discovery. The rise explainable offers improve opportunity true by providing explanations predictions explored mechanistically before proceeding costly time-consuming validation studies. This review aims introduce some basic concepts machine biomarker focus on post hoc explanation predictions. To illustrate this, we consider how has already been used successfully, explore case study applies rheumatoid arthritis, demonstrating accessibility tools learning. We use discuss potential challenges solutions critically interrogate disease response mechanisms.

Language: Английский

Citations

49

AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding DOI Creative Commons
Lingyan Zheng, Shuiyang Shi, Mingkun Lu

et al.

Genome biology, Journal Year: 2024, Volume and Issue: 25(1)

Published: Feb. 1, 2024

Abstract Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have developed. However, existing suffer from a serious long-tail problem, with large number GO families containing few annotated proteins. Herein, an innovative strategy named AnnoPRO was therefore constructed by enabling sequence-based multi-scale protein representation, dual-path encoding using pre-training, long short-term memory-based decoding. A variety case studies based on different benchmarks were conducted, which confirmed superior performance among available methods. Source code models made freely at: https://github.com/idrblab/AnnoPRO https://zenodo.org/records/10012272

Language: Английский

Citations

36

Best practices for the execution, analysis, and data storage of plant single-cell/nucleus transcriptomics DOI Creative Commons

Carolin Grones,

Thomas Eekhout, Dongbo Shi

et al.

The Plant Cell, Journal Year: 2024, Volume and Issue: 36(4), P. 812 - 828

Published: Jan. 17, 2024

Abstract Single-cell and single-nucleus RNA-sequencing technologies capture the expression of plant genes at an unprecedented resolution. Therefore, these are gaining traction in molecular developmental biology for elucidating transcriptional changes across cell types a specific tissue or organ, upon treatments, response to biotic abiotic stresses, between genotypes. Despite rapidly accelerating use technologies, collective standardized experimental analytical procedures support acquisition high-quality data sets still missing. In this commentary, we discuss common challenges associated with single-cell transcriptomics plants propose general guidelines improve reproducibility, quality, comparability, interpretation make readily available community fast-developing field research.

Language: Английский

Citations

25