Cited by Cell type and dynamic state govern genetic regulation of gene expression in heterogeneous differentiating cultures

Best practices for single-cell analysis across modalities DOI

Lukas Heumos, Anna C. Schaar, Christopher Lance

et al.

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 24(8), P. 550 - 572

Published: March 31, 2023

Language: Английский

Citations

513

Comparison of transformations for single-cell RNA-seq data DOI

Constantin Ahlmann-Eltze, Wolfgang Huber

Nature Methods, Journal Year: 2023, Volume and Issue: 20(5), P. 665 - 672

Published: April 10, 2023

Abstract The count table, a numeric matrix of genes × cells, is the basic input data structure in analysis single-cell RNA-sequencing data. A common preprocessing step to adjust counts for variable sampling efficiency and transform them so that variance similar across dynamic range. These steps are intended make subsequent application generic statistical methods more palatable. Here, we describe four transformation approaches based on delta method, model residuals, inferred latent expression state factor analysis. We compare their strengths weaknesses find latter three have appealing theoretical properties; however, benchmarks using simulated real-world data, it turns out rather simple approach, namely, logarithm with pseudo-count followed by principal-component analysis, performs as well or better than sophisticated alternatives. This result highlights limitations current assessed bottom-line performance benchmarks.

Language: Английский

Citations

Aitchison’s Compositional Data Analysis 40 Years on: A Reappraisal DOI

Michael Greenacre, Eric Grunsky, John Bacon‐Shone

et al.

Statistical Science, Journal Year: 2023, Volume and Issue: 38(3)

Published: March 22, 2023

The development of John Aitchison's approach to compositional data analysis is followed since his paper read the Royal Statistical Society in 1982. logratio approach, which was proposed solve problematic aspects working with a fixed-sum constraint, summarized and reappraised. It maintained that properties on this originally built, main one being subcompositional coherence, are not required be satisfied exactly—quasi-coherence sufficient, near enough coherent for all practical purposes. This opens up field using simpler transformations, such as power permit zero values data. additional property exact isometry, subsequently introduced original conception, imposed use isometric but these complicated interpret, involving ratios geometric means. If regarded important certain analytical contexts, example, unsupervised learning, it can relaxed by showing regular pairwise logratios, well alternative quasi-coherent also quasi-isometric, meaning they close isometry concluded related transformations pivot logratios prerequisite good practice, although many authors insist their obligatory use. conclusion fully supported here case studies geochemistry genomics, where performance demonstrated Aitchison, or Box–Cox transforms compositions no replacements necessary.

Language: Английский

Citations

Single-cell RNA-seq differential expression tests within a sample should use pseudo-bulk data of pseudo-replicates DOI

Christoph Hafemeister, Florian Halbritter

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: March 29, 2023

Abstract Single-cell RNA sequencing (scRNA-seq) has become a standard approach to investigate molecular differences between cell states. Comparisons of bioinformatics methods for the count matrix transformation (normalization) and differential expression (DE) analysis these data have already highlighted recommendations effective between-sample comparisons visualization. Here, we examine two remaining open questions: (i) What are best combinations transformations statistical test methods, (ii) how do pseudo-bulk approaches perform in single-sample designs? We evaluated performance 343 DE pipelines (combinations eight types ten tests) on simulated real-world data, terms precision, sensitivity, false discovery rate. confirm superior without prior transformation. For within-sample comparisons, advise use three pseudo-replicates, provide simple R package DElegate facilitate application this approach.

Language: Английский

Citations

Studying stochastic systems biology of the cell with single-cell genomics data DOI

Gennady Gorin, John J. Vastola, Lior Pachter

et al.

Cell Systems, Journal Year: 2023, Volume and Issue: 14(10), P. 822 - 843.e22

Published: Sept. 25, 2023

Language: Английский

Citations

Deep Learning in Single-cell Analysis DOI

Dylan Molho, Jiayuan Ding, Wenzhuo Tang

et al.

ACM Transactions on Intelligent Systems and Technology, Journal Year: 2024, Volume and Issue: 15(3), P. 1 - 62

Published: Jan. 26, 2024

Single-cell technologies are revolutionizing the entire field of biology. The large volumes data generated by single-cell high dimensional, sparse, and heterogeneous have complicated dependency structures, making analyses using conventional machine learning approaches challenging impractical. In tackling these challenges, deep often demonstrates superior performance compared to traditional methods. this work, we give a comprehensive survey on in analysis. We first introduce background their development, as well fundamental concepts including most popular architectures. present an overview analytic pipeline pursued research applications while noting divergences due sources or specific applications. then review seven tasks spanning different stages analysis pipeline, multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, annotation. Under each task, describe recent developments classical methods discuss advantages disadvantages. Deep tools benchmark datasets also summarized for task. Finally, future directions challenges. This will serve reference biologists computer scientists, encouraging collaborations.

Language: Английский

Citations

scLENS: data-driven signal detection for unbiased scRNA-seq data analysis DOI

Hyun Kim, Won Chang,

Seok Joo Chae

et al.

Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)

Published: April 27, 2024

Abstract High dimensionality and noise have limited the new biological insights that can be discovered in scRNA-seq data. While reduction tools been developed to extract signals from data, they often require manual determination of signal dimension, introducing user bias. Furthermore, a common data preprocessing method, log normalization, unintentionally distort Here, we develop scLENS, tool circumvents long-standing issues distortion input. Specifically, identify primary cause during normalization effectively address it by uniformizing cell vector lengths with L2 normalization. utilize random matrix theory-based filtering robustness test enable data-driven threshold for dimensions. Our method outperforms 11 widely used performs particularly well challenging datasets high sparsity variability. To facilitate use provide user-friendly package automates accurate detection without time-consuming tuning.

Language: Английский

Citations

ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping DOI

Dongyuan Song, Siqi Chen,

Christy Lee

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: July 25, 2023

Double dipping is a well-known pitfall in single-cell and spatial transcriptomics data analysis: after clustering algorithm finds clusters as putative cell types or domains, statistical tests are applied to the same identify differentially expressed (DE) genes potential cell-type spatial-domain markers. Because that contribute inherently likely be identified DE genes, double can result false-positive markers, especially when spurious, leading ambiguously defined domains. To address this challenge, we propose ClusterDE, method designed post-clustering reliable markers of while controlling false discovery rate (FDR) regardless quality. The core ClusterDE involves generating synthetic null an silico negative control contains only one type domain, allowing for detection removal spurious discoveries caused by dipping. We demonstrate controls FDR identifies canonical top distinguishing them from housekeeping genes. ClusterDE's ability discover absence such used determine whether two ambiguous should merged. Additionally, compatible with state-of-the-art analysis pipelines like Seurat Scanpy.

Language: Английский

Citations

A systematic evaluation of highly variable gene selection methods for single-cell RNA-sequencing DOI

Ruzhang Zhao, Jiuyao Lu, Weiqiang Zhou

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 26, 2024

Abstract Background Selecting highly variable features is a crucial step in most analysis pipelines of single-cell RNA-sequencing (scRNA-seq) data. Despite numerous methods proposed recent years, systematic understanding the best solution still lacking. Results Here, we systematically evaluate 47 gene (HVG) selection methods, consisting 21 baseline developed based on different data transformations and mean-variance adjustment techniques 26 hybrid mixtures methods. Across 19 diverse benchmark datasets, 18 objective evaluation criteria per method, 5,358 settings, observe that no single method consistently outperforms others across all datasets criteria. However, as group robustly outperform individual Based these findings, new HVG approach, mixture (mixHVG), incorporates top-ranked from multiple better to selection. An open source R package mixhvg enable convenient use mixHVG its integration into users’ pipelines. Conclusion Our study not only provides comparison existing leading solution, but also creates pipeline resource for evaluating future.

Language: Английский

Citations

Biases in machine-learning models of human single-cell data DOI

Theresa Willem, Vladimir A. Shitov, Malte D. Luecken

et al.

Nature Cell Biology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 19, 2025

Language: Английский

Citations